/u/Leopold_Boom - Provide.ai

Speculative decoding works great for Gemma 4 31B in llama.cpp

/u/Leopold_Boom / April 4, 2026

I get a ~11% speed up with Gemma 3 270B as the draft model. Try it by adding: –no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Testing with (on a 3090): ./build/bin/llama-cli -hf unsloth/gemma-4-31B-it-GGUF:Q4_1 –jinja –temp 1.0 –top-p 0.95 –top…

Author name: /u/Leopold_Boom

Speculative decoding works great for Gemma 4 31B in llama.cpp