Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung / May 11, 2026

arXiv:2605.06675v1 Announce Type: cross
Abstract: Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Qua…

Author name: Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory