Im using these settings in llama.cpp: --spec-type ngram-map-k --spec-ngram-size-n 24 --draft-min 12 --draft-max 48
Whats the real reason for lets say the prompt is for "minor changes in code", whats differing between models:
Gemma 4 31b: Doubles in tks gen so 100%
Qwen 3.6: Only 40% more speed
Devstrall small: 665% increase in speed (what?)
[link] [comments]