I'm upgrading my setup to run larger models and need a second GPU to pair with my current RTX 4070 (12GB).
My Workloads:
LLMs: Up to 32B dense (Gemma 4 31B) and ~120B MoE (Qwen 122B10A). I mostly run Q4/IQ4/UD MXFP4 quants.
Image diffusion model: FireRed 1.1 (Q4).
Target: 30+ tps at large contexts (up to 256k). Currently hitting a memory ceiling around 131k context (yesterday using Qwen 3.6 35B3A).
The Options & Market Constraints:
RTX 5070 Ti 16GB (New): ~1.2k USD.
RTX 3090 24GB (Used only): ~1k USD. (Pricing is rather complicated, finding it is even more complicated, might go for above 1k)
5060 TI 16 GB (New): ~600 USD
I strictly prefer buying new. There is no proper way to verify how "old" or "used" the GPU is.
My Hardware Limits:
CPU/RAM: Ryzen 9 9950X, 80GB DDR5 (pairing 24gb pairs and 16gb).
Mobo/PSU: X870E, MSI MAG A1000GLS PCIE5 1000W.
Clearance: GC-801 Case with a front-mounted 360 AIO inside. Long cards like the ASUS TUF won't clear the radiator (probably, i'm guessing). I am limited to shorter tri-fan models (ASUS Prime, MSI Ventus 3X, Zotac Trinity).
Layout: New card in top PCI_E1 (x16), 4070 (2.55 slots) dropped to bottom PCI_E3 (x4).
tl;dr:
Will the combined 28GB of the 5070 Ti + 4070 comfortably handle 32B dense models at 200k+ context and 120B MoEs at 30+ tps? Or is the 36GB combined capacity of the 3090 path a hard requirement for this? I want to know if the extra 8GB VRAM is worth buying a 5-year-old used card and giving up Blackwell's FP8/FP4 perks.
I know they're approximately the same speed, but there's a vram difference, a size difference, a PSU requirement difference, and well, it's old, and used can mean bitcoin miner or can mean a former gamer who grew up.
Because i feel like 28 gb vs 36, there isn't much "unlocked" exactly, and that the true jump is more between 24, 48 and 96, i could be wrong, but i feel running things at Q4 is very much enough and there are no 70b+ models to justify the jump?
[link] [comments]