FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
arXiv:2605.11478v1 Announce Type: cross
Abstract: Long-context inference is increasingly a memory-traffic problem. The culprit is the key–value (KV) cache: it grows with context length, batch size, layers, and heads, and it is read at every decoding …