I’ve got one 3090 and thanks to the help of MTP and all, I can do around 65 tok/s on qwen 3.6 dense 27b. But I’m running at Q4_M so everything fits and my context isn’t super high. Maybe 65k or up to 100k.
I’ve thrown around the idea of a second 3090. But I do already have some gaming PCs running parallel stuff with smaller 3080 (2x) and 4080S cards to support my 3090. So it seems the real benefit of a second 3090 is running at a higher quant.
But for those that do, have you noticed a big difference?
[link] [comments]