cs.AI, cs.CL

TAPS: Task Aware Proposal Distributions for Speculative Sampling

arXiv:2603.27027v1 Announce Type: new
Abstract: Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft mod…