GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s

By /u/Fit-Courage5400 / April 16, 2026

Hey all,

Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B.

What I’m targeting

Context: 64K+ (ideally higher later)
Speed: 30+ tok/s @ tg128 minimum
Power: not critical, but lower is a bonus

From what I’ve seen, these dense models are way more demanding than MoE.

Why not MoE?

I’m already running MoE just fine on P40s:

Gemma 4 26B MoE
~32K ctx
~42+ tok/s @ tg128

So now I want to move to dense models for better quality / reasoning.

Budget

~2500 AUD (~$1800 USD)
GPU only (already have CPU / RAM / board)
Ignore PCIe lane limits for now

Options I’m considering

A. 2× 9070 XT (16GB)
B. 1× R9 9700 (32GB)
C. 2× 7900 XTX (24GB)
D. 1× RTX Pro 4000 (24GB)

N. 1× Intel Arc Pro B70 (32GB, maybe future option, but not now)

My current understanding (please correct me)

16GB cards → basically forced into pipeline parallel, so per-GPU compute matters a lot
2× 7900 XTX should have the best raw throughput
RTX Pro 4000 maybe similar class, but VRAM limits context flexibility
32GB single card (R9 9700) is attractive for KV cache / long ctx, BUT:
- perf ≈ 9070 XT?
- price = ~2× 9070 XT + extra GPU…
2× 9070 XT might be best “budget parallel” option

Concerns (based on what I’ve seen here)

KV cache is brutal on Gemma 4 31B“massive KV cache… biggest drawback”
Even people with large VRAM struggle with higher quants / context
24GB seems like the minimum viable tier for 31B dense
Long context scaling is still very hardware-sensitive
Multi-GPU scaling (esp PCIe) seems very inconsistent depending on backend

What I want to know

If you’ve actually run Qwen3.5 27B / Gemma 4 31B (dense):

What GPU are you using?
What real tok/s are you getting (esp @ 64K+)
Does multi-GPU actually scale well or just look good on paper?
Is 32GB single GPU > dual 16/24GB in practice?
Any regrets / “don’t buy this” advice?

Bonus question

If you had ~$1800 today, would you:

go multi-GPU AMD (cheap + raw compute)
or single high-VRAM card (simpler + better ctx)

Appreciate any real benchmarks / configs 🙏

submitted by /u/Fit-Courage5400
[link] [comments]

Leave a Comment