LocalLLaMA

Tested TurboQuant KV compression with Gemma 4 31B — 5.80x compression, perfect long-context recall, JSON output preserved

Quick experiment: I implemented Google Research's TurboQuant paper (arXiv 2504.19874) as a Python package and tested it with Google's brand new Gemma 4 31B model. The results exceeded the paper's claims. Setup: Hardware: RTX PRO 6000 Black…