Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
arXiv:2509.24328v2 Announce Type: replace
Abstract: LLMs have low GPU efficiency and high latency due to autoregressive decoding. Speculative decoding (SD) mitigates this using a small draft model to speculatively generate multiple tokens, which are t…