GPU strategy for local LLM + mixed workloads (70-person company)

Hey all,

we’re a mid-sized company (~70 people) and currently planning to bring a lot of our workloads on-prem instead of relying on cloud APIs. The goal for the moment is to run small to mid-sized models in the range of 30B like Qwen3.6 or Gemma4.

Use cases:

Internal Chatbot (email, assistants, maybe some RAG)
~30 software devs, currently not yet using agentic coding
ML training (PyTorch, CNNs, ViTs)
Some raytracing

We’ve got a server with 10 PCIe slots and are considering:

Option A (NVIDIA):

2× RTX 6000 Pro (as a starting point)
~192 GB VRAM total for 19k€

Option B (AMD):

10× Radeon AI Pro R9700
~320 GB VRAM total for ~15k€

Main concerns:

Multi-GPU scaling (2 big vs 10 small)
AMD vs NVIDIA for mixed workloads (esp. rendering, pytorch training)
Scaling options in the future
We are currently using llamacpp but from what I've read here, vllm would be better for our multi-user use-case. How does vllm behave when splitting models up over many gpus?

What would you pick for a team setup like this?

submitted by /u/Sufficient_Type_5792
[link] [comments]

GPU strategy for local LLM + mixed workloads (70-person company) — NVIDIA vs AMD?

Leave a Comment