[Benchmark] Dual RTX 5090 Distributed Inference via llama.cpp RPC – Running 122B MoE at 96 t/s over 2.5GbE
Model Size Single 5090 (t/s) Dual 5090 RPC (t/s) Note Qwen3.5-27B (Q6_K) 20.9 GB 59.83 55.41 -7% Overhead Qwen3.5-35B MoE (Q6_K) 26.8 GB 206.76 150.99 Interconnect Bottleneck Qwen2.5-32B (Q6_K) 25.0 GB 54.69 51.47 Stable Scaling Qwen2….