Xiao Wang - Provide.ai

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

Xiao Wang / April 21, 2026

arXiv:2604.17935v1 Announce Type: new
Abstract: The key-value (KV) cache is the dominant memory bottleneck during Transformer inference, yet little is known theoretically about how aggressively it can be compressed before multi-step reasoning degrades…

Author name: Xiao Wang

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers