Acceptance Dynamics Across Cognitive Domains in Speculative Decoding
arXiv:2604.14682v1 Announce Type: cross
Abstract: Speculative decoding accelerates large language model (LLM) inference. It uses a small draft model to propose a tree of future tokens. A larger target model then verifies these tokens in a single batch…