James O'Neill, Robert Clancy, Mariia Matskevichus, Fergal Reid

Low-Rank Key Value Attention

James O'Neill, Robert Clancy, Mariia Matskevichus, Fergal Reid / April 9, 2026

arXiv:2601.11471v3 Announce Type: replace
Abstract: The key-value (KV) cache is a primary memory bottleneck in Transformers. We propose Low-Rank Key-Value (LRKV) attention, which reduces KV cache memory by exploiting redundancy across attention heads,…

Author name: James O'Neill, Robert Clancy, Mariia Matskevichus, Fergal Reid

Low-Rank Key Value Attention