/u/GregoryfromtheHood

llama.cpp Gemma 4 using up all system RAM on larger prompts

/u/GregoryfromtheHood / April 6, 2026

Something I'm noticing that I don't think I've noticed before. I've been testing out Gemma 4 31B with 32GB of VRAM and 64GB of DDR5. I can load up the UD_Q5_K_XL Unsloth quant with about 100k context with plenty of VRAM headroom, but wh…

Author name: /u/GregoryfromtheHood

llama.cpp Gemma 4 using up all system RAM on larger prompts