GPU strategy for local LLM + mixed workloads (70-person company) — NVIDIA vs AMD?

Hey all,

we’re a mid-sized company (~70 people) and currently planning to bring a lot of our workloads on-prem instead of relying on cloud APIs. The goal for the moment is to run small to mid-sized models in the range of 30B like Qwen3.6 or Gemma4.

Use cases:

  • Internal Chatbot (email, assistants, maybe some RAG)
  • ~30 software devs, currently not yet using agentic coding
  • ML training (PyTorch, CNNs, ViTs)
  • Some raytracing

We’ve got a server with 10 PCIe slots and are considering:

Option A (NVIDIA):

  • 2× RTX 6000 Pro (as a starting point)
  • ~192 GB VRAM total for 19k€

Option B (AMD):

  • 10× Radeon AI Pro R9700
  • ~320 GB VRAM total for ~15k€

Main concerns:

  • Multi-GPU scaling (2 big vs 10 small)
  • AMD vs NVIDIA for mixed workloads (esp. rendering, pytorch training)
  • Scaling options in the future
  • We are currently using llamacpp but from what I've read here, vllm would be better for our multi-user use-case. How does vllm behave when splitting models up over many gpus?

What would you pick for a team setup like this?

submitted by /u/Sufficient_Type_5792
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top