Speculative decoding question, 665% speed increase

Im using these settings in llama.cpp: --spec-type ngram-map-k --spec-ngram-size-n 24 --draft-min 12 --draft-max 48

Whats the real reason for lets say the prompt is for "minor changes in code", whats differing between models:
Gemma 4 31b: Doubles in tks gen so 100%
Qwen 3.6: Only 40% more speed
Devstrall small: 665% increase in speed (what?)

submitted by /u/GodComplecs
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top