Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi

KaVa: Latent Reasoning via Compressed KV-Cache Distillation

Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi / May 8, 2026

arXiv:2510.02312v2 Announce Type: replace
Abstract: Large Language Models (LLMs) excel at multi-step reasoning problems with explicit chain-of-thought (CoT), but verbose traces incur significant computational costs and memory overhead, and often carry…

Author name: Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi

KaVa: Latent Reasoning via Compressed KV-Cache Distillation