How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers
arXiv:2604.17935v1 Announce Type: new
Abstract: The key-value (KV) cache is the dominant memory bottleneck during Transformer inference, yet little is known theoretically about how aggressively it can be compressed before multi-step reasoning degrades…