/u/Maheidem - Provide.ai

Qwen3.6 27B NVFP4 + MTP on a single RTX 5090: 200k context working in vLLM

/u/Maheidem / May 6, 2026

So I spent some time testing Qwen3.6 27B NVFP4 on my RTX 5090 and wanted to share the numbers, since most of the recent good posts are either around 48GB cards, FP8, or llama.cpp/GGUF. This is not a "best possible setup" claim. More like: thi…

Author name: /u/Maheidem

Qwen3.6 27B NVFP4 + MTP on a single RTX 5090: 200k context working in vLLM