Speculative Speculative Decoding
arXiv:2603.03251v3 Announce Type: replace
Abstract: Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a …