LocalLLaMA

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression. System Specs Component Spec GPU NVIDIA GeForce RTX 5090 (32GB VRAM) CPU AMD Ryzen 9 9950X3D (16-core) RAM 64GB DDR5 OS Windows 11 …