/u/val_in_tech - Provide.ai

GLM 5.1 Locally: 40tps, 2000+ pp/s

/u/val_in_tech / April 25, 2026

After some sglang patching and countless experiments, managed to get reap-ed nvfp4 version running stable and FAST on 4 x RTX 6000 Pros (limited to 350W). Very happy with performance and quality. Inference software is still under-optimized for those ca…

Author name: /u/val_in_tech

GLM 5.1 Locally: 40tps, 2000+ pp/s