A friend is going on vacation for a couple weeks and is lending me an RTX 6000 Pro rig to mess around with.
Holy cow, it is so much faster than my 4080 Super! Some preliminary LM Studio benches showing 10x in token generation, and 60x in prompt processing and I haven't even started tweaking anything yet.
4080 Super: Qwen 3.6 27B Q2 quant at ~ 6 tk/s. TTFT was ~60sec
RTX 6000 Pro: Qwen3.6 27B Q8 XL at 67tk/s. TTFT was ~1sec.
Will be exciting to see if M5 Ultra can close the gap otherwise, I may need to pick up a couple of these bad boys or whatever their successor is.
[link] [comments]