/u/Qwoctopussy - Provide.ai

why llama.cpp can’t combine speculative decode methods?

/u/Qwoctopussy / May 7, 2026

dicking around with the new mtp speculative decode with qwen3.6 27b, and it’s great. but for agentic coding i’ve seen significant improvements from ngram, because a decent fraction of the time (e.g. calling edit tool) the model is just repeating verbat…

Author name: /u/Qwoctopussy

why llama.cpp can’t combine speculative decode methods?