cs.AI, cs.IT, cs.LG, math.IT

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

arXiv:2604.25975v1 Announce Type: cross
Abstract: Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on…