cs.CL

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

arXiv:2605.14978v2 Announce Type: replace
Abstract: Speculative decoding accelerates LLM inference by having a lightweight draft model propose speculative windows of candidate tokens for parallel verification by a larger target model. In practice, spe…