LocalLLaMA

Slow tok/s when offloading NVFP4 model to CPU

Title. I was messing around with Qwen3.6 35B A3B Q4_K_XL on my RTX 5070, and I got around 50 tok/s. I then realized I could be leveraging NVFP4 on my Blackwell GPU, but I tried it and it barely reached 14tok/s. The model doesn't fit on VRAM, so I h…