GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s

Hey all,

Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B.

What I’m targeting

  • Context: 64K+ (ideally higher later)
  • Speed: 30+ tok/s @ tg128 minimum
  • Power: not critical, but lower is a bonus

From what I’ve seen, these dense models are way more demanding than MoE.

Why not MoE?

I’m already running MoE just fine on P40s:

  • Gemma 4 26B MoE
  • ~32K ctx
  • ~42+ tok/s @ tg128

So now I want to move to dense models for better quality / reasoning.

Budget

  • ~2500 AUD (~$1800 USD)
  • GPU only (already have CPU / RAM / board)
  • Ignore PCIe lane limits for now

Options I’m considering

A. 2× 9070 XT (16GB)
B. 1× R9 9700 (32GB)
C. 2× 7900 XTX (24GB)
D. 1× RTX Pro 4000 (24GB)

N. 1× Intel Arc Pro B70 (32GB, maybe future option, but not now)

My current understanding (please correct me)

  • 16GB cards → basically forced into pipeline parallel, so per-GPU compute matters a lot
  • 2× 7900 XTX should have the best raw throughput
  • RTX Pro 4000 maybe similar class, but VRAM limits context flexibility
  • 32GB single card (R9 9700) is attractive for KV cache / long ctx, BUT:
    • perf ≈ 9070 XT?
    • price = ~2× 9070 XT + extra GPU…
  • 2× 9070 XT might be best “budget parallel” option

Concerns (based on what I’ve seen here)

  • KV cache is brutal on Gemma 4 31B“massive KV cache… biggest drawback”
  • Even people with large VRAM struggle with higher quants / context
  • 24GB seems like the minimum viable tier for 31B dense
  • Long context scaling is still very hardware-sensitive
  • Multi-GPU scaling (esp PCIe) seems very inconsistent depending on backend

What I want to know

If you’ve actually run Qwen3.5 27B / Gemma 4 31B (dense):

  • What GPU are you using?
  • What real tok/s are you getting (esp @ 64K+)
  • Does multi-GPU actually scale well or just look good on paper?
  • Is 32GB single GPU > dual 16/24GB in practice?
  • Any regrets / “don’t buy this” advice?

Bonus question

If you had ~$1800 today, would you:

  • go multi-GPU AMD (cheap + raw compute)
  • or single high-VRAM card (simpler + better ctx)

Appreciate any real benchmarks / configs 🙏

submitted by /u/Fit-Courage5400
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top