cs.AI, cs.CC, cs.LG

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

arXiv:2604.17935v1 Announce Type: new
Abstract: The key-value (KV) cache is the dominant memory bottleneck during Transformer inference, yet little is known theoretically about how aggressively it can be compressed before multi-step reasoning degrades…