LocalLLaMA

Speculative decoding works great for Gemma 4 31B in llama.cpp

I get a ~11% speed up with Gemma 3 270B as the draft model. Try it by adding: –no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Testing with (on a 3090): ./build/bin/llama-cli -hf unsloth/gemma-4-31B-it-GGUF:Q4_1 –jinja –temp 1.0 –top-p 0.95 –top…