TAPS: Task Aware Proposal Distributions for Speculative Sampling
arXiv:2603.27027v1 Announce Type: new
Abstract: Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft mod…