Speculative decoding question, 665% speed increase
Im using these settings in llama.cpp: –spec-type ngram-map-k –spec-ngram-size-n 24 –draft-min 12 –draft-max 48 Whats the real reason for lets say the prompt is for "minor changes in code", whats differing between models: Gemma 4 31b: Doub…