LocalLLaMA

llama.cpp Gemma 4 using up all system RAM on larger prompts

Something I'm noticing that I don't think I've noticed before. I've been testing out Gemma 4 31B with 32GB of VRAM and 64GB of DDR5. I can load up the UD_Q5_K_XL Unsloth quant with about 100k context with plenty of VRAM headroom, but wh…