LocalLLaMA

GLM 5.1 Locally: 40tps, 2000+ pp/s

After some sglang patching and countless experiments, managed to get reap-ed nvfp4 version running stable and FAST on 4 x RTX 6000 Pros (limited to 350W). Very happy with performance and quality. Inference software is still under-optimized for those ca…