KaVa: Latent Reasoning via Compressed KV-Cache Distillation
arXiv:2510.02312v2 Announce Type: replace
Abstract: Large Language Models (LLMs) excel at multi-step reasoning problems with explicit chain-of-thought (CoT), but verbose traces incur significant computational costs and memory overhead, and often carry…