cs.LG

Training Transformers for KV Cache Compressibility

arXiv:2605.05971v1 Announce Type: new
Abstract: Long-context language modeling is increasingly constrained by the Key-Value (KV) cache, whose memory and decode-time access costs scale linearly with the prefix length. This bottleneck has motivated a ra…