/u/a9udn9u - Provide.ai

Llama.cpp’s auto fit works much better than I expected

/u/a9udn9u / April 21, 2026

I always thought with 32GB of VRAM, the biggest models I could run were around 20GB, like Qwen3.5 27B Q4 or Q6. I had an impression that everything had to fit in VRAM or I'd get 2 t/s. Man was I wrong. I just tested Qwen3.6 Q8 with 256k context on …

Author name: /u/a9udn9u

Llama.cpp’s auto fit works much better than I expected