/u/__JockY__ - Provide.ai

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB

/u/__JockY__ / May 5, 2026

—-START HUMAN TEXT—- Hi all, I've seen a bunch of posts about squeezing 27B onto a 24GB card and all the quantization tricks involved in doing so. It's all amazing work, but at the end of the day a quantized model with quantized KV will ine…

Author name: /u/__JockY__

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB

Author name: /u/JockY