/u/6c5d1129 - Provide.ai

Slow tok/s when offloading NVFP4 model to CPU

/u/6c5d1129 / May 4, 2026

Title. I was messing around with Qwen3.6 35B A3B Q4_K_XL on my RTX 5070, and I got around 50 tok/s. I then realized I could be leveraging NVFP4 on my Blackwell GPU, but I tried it and it barely reached 14tok/s. The model doesn't fit on VRAM, so I h…

Author name: /u/6c5d1129

Slow tok/s when offloading NVFP4 model to CPU