cs.LG

Low-Rank Key Value Attention

arXiv:2601.11471v3 Announce Type: replace
Abstract: The key-value (KV) cache is a primary memory bottleneck in Transformers. We propose Low-Rank Key-Value (LRKV) attention, which reduces KV cache memory by exploiting redundancy across attention heads,…