The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
arXiv:2604.15409v1 Announce Type: cross
Abstract: KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. This assumption fails under standard FP16 precisio…