Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding
arXiv:2605.00342v1 Announce Type: new
Abstract: Tree-based speculative decoding accelerates autoregressive generation by verifying multiple draft candidates in parallel, but this advantage weakens for sparse Mixture-of-Experts (MoE) models. As the dra…