Author name: Willy Fitra Hendria

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

Willy Fitra Hendria / May 15, 2026

arXiv:2605.08913v2 Announce Type: replace
Abstract: Autoregressive inference is typically assumed to scale predictably with decoding length, with latency increasing smoothly as generated sequence length grows. In this work, we identify unexpected non-…

cs.AR, cs.CL, cs.LG, cs.PF

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

Willy Fitra Hendria / May 12, 2026

arXiv:2605.08913v1 Announce Type: new
Abstract: Autoregressive inference is typically assumed to scale predictably with decoding length, and key-value (KV) caching is widely regarded as a universally beneficial optimization for accelerating decoding. …