Mikhail Shirokikh, Sergey Nikolenko

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

Mikhail Shirokikh, Sergey Nikolenko / May 8, 2026

arXiv:2605.05219v1 Announce Type: cross
Abstract: Prefix caching is a key latency optimization for autoregressive LLM serving, yet existing systems assume dense per-token key/value reuse. State-space models change the structure of the problem: a recur…

Author name: Mikhail Shirokikh, Sergey Nikolenko

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving