cs.AI, cs.LG

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

arXiv:2605.03562v1 Announce Type: cross
Abstract: KV-cache quantizers usually optimize storage-space reconstruction, even though attention reads keys through logits and values through attention-weighted readout. We argue that persistent cache error sh…