cs.IT, cs.LG, cs.MS, math.IT

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

arXiv:2605.08114v1 Announce Type: new
Abstract: We analyse three KV cache quantization schemes under a fair bit budget: \textbf{KV} (scalar MSE baseline), \textbf{KQV} (WHT + MSE on $K$; WHT + MSE + QJL on $V$), and \textbf{QKQV} (WHT + MSE + QJL on b…