cs.AI, cs.IT, math.IT, stat.ML

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression

arXiv:2605.11478v1 Announce Type: cross
Abstract: Long-context inference is increasingly a memory-traffic problem. The culprit is the key–value (KV) cache: it grows with context length, batch size, layers, and heads, and it is read at every decoding …