When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?
arXiv:2604.26412v1 Announce Type: new
Abstract: Speculative decoding accelerates LLM inference, but SOTA hidden-state-based drafters suffer from long-range decay: draft accuracy degrades as the speculative step increases. Existing work attributes this…