Ranjith Chodavarapu, Lei Xu

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference

Ranjith Chodavarapu, Lei Xu / April 20, 2026

arXiv:2604.15409v1 Announce Type: cross
Abstract: KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. This assumption fails under standard FP16 precisio…

Author name: Ranjith Chodavarapu, Lei Xu

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference