Junkai Zhang, Hang Guo, Luca Benini, Yawei Li

RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache

Junkai Zhang, Hang Guo, Luca Benini, Yawei Li / May 12, 2026

arXiv:2605.08317v1 Announce Type: cross
Abstract: Large language models (LLMs) have shown strong performance across diverse tasks, but their inference with long input contexts is bottlenecked by memory size and bandwidth. The Key-Value (KV) cache size…

Author name: Junkai Zhang, Hang Guo, Luca Benini, Yawei Li

RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache