Zihong Zhang, Zuchao Li, Lefei Zhang, Ping Wang, Hai Zhao

RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding

Zihong Zhang, Zuchao Li, Lefei Zhang, Ping Wang, Hai Zhao / April 17, 2026

arXiv:2604.14885v1 Announce Type: new
Abstract: Autoregressive decoding in Large Language Models (LLMs) generates one token per step, causing high inference latency. Speculative decoding (SD) mitigates this through a guess-and-verify strategy, but exi…

Author name: Zihong Zhang, Zuchao Li, Lefei Zhang, Ping Wang, Hai Zhao

RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding