Jorge L. Ruiz Williams

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

Jorge L. Ruiz Williams / May 6, 2026

arXiv:2605.03562v1 Announce Type: cross
Abstract: KV-cache quantizers usually optimize storage-space reconstruction, even though attention reads keys through logits and values through attention-weighted readout. We argue that persistent cache error sh…

Author name: Jorge L. Ruiz Williams

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization