| Qwen 3.5 35B on LocalAI: Vulkan vs ROCmHey everyone! 👋 Just finished running a bunch of benchmarks on the new Qwen 3.5 35B models using LocalAI and figured I'd share the results. I was curious how Vulkan and ROCm backends stack up against each other for these two different quant/source variants. Two model variants, each on both Vulkan and ROCm:
Tool: Context depths tested: 0, 4K, 8K, 16K, 32K, 65K, 100K, and up to 200K tokens. System EnvironmentLemonade Version: 10.1.0 ```text vulkan : 'b8681' rocm : 'b1232' cpu : 'b8681' ```The results1. Qwen3.5-35B-A3B-APEX-I-Quality (mudler)(See charts 1 & 2) On token generation, Vulkan is the clear winner here, consistently outperforming ROCm. At zero context, Vulkan hits ~57.5 t/s compared to ROCm's ~50.0 t/s. As context grows to 100K, Vulkan maintains a healthy ~38.6 t/s while ROCm drops to ~35.7 t/s. Prompt processing is where ROCm shows its strength, though Vulkan is very competitive. At 4K context, ROCm hits ~885 t/s while Vulkan is at ~759 t/s. The gap remains significant even at higher context depths. 2. Qwen3.5-35B-A3B-ThinkingCoder (unsloth)(See charts 3 & 4) This variant follows a very similar pattern. On token generation, Vulkan again takes the lead, starting at ~53.3 t/s (vs ROCm's ~46.6 t/s) and maintaining a lead even at 100K context. Prompt processing is notably faster on ROCm, hitting ~1052 t/s at 2K context, whereas Vulkan is around ~798 t/s.
Big picture:
For day-to-day use, if you want the fastest response time per token, Vulkan is the way to go. If you are processing massive amounts of text in a single prompt, ROCm might give you the edge. *Benchmarks done with llama-benchy. [link] [comments] |