LocalLLaMA

A First Comprehensive Study of TurboQuant: Accuracy and Performance

TL;DR from the article: FP8 via –kv-cache-dtype fp8 remains the best default for KV-cache quantization: it provides 2x KV-cache capacity with negligible accuracy loss, while matching BF16 on most performance metrics and substantially improving …