cs.CL

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

arXiv:2605.14978v1 Announce Type: new
Abstract: Speculative decoding accelerates LLM inference by having a lightweight draft model propose speculative windows of candidate tokens for parallel verification by a larger target model. In practice, specula…