Namyoon Lee, Yongjune Kim

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression

Namyoon Lee, Yongjune Kim / May 13, 2026

arXiv:2605.11478v1 Announce Type: cross
Abstract: Long-context inference is increasingly a memory-traffic problem. The culprit is the key–value (KV) cache: it grows with context length, batch size, layers, and heads, and it is read at every decoding …

Author name: Namyoon Lee, Yongjune Kim

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression