Vulkan backend outperforms ROCm on Strix Halo (gfx1151) — llama.cpp benchmark

Just ran some llama-bench comparisons between ROCm and Vulkan backends on my Strix Halo system. Vulkan came out ahead, which surprised me.

Hardware:

- AMD Radeon 8060S (gfx1151 / Strix Halo)

- 64GB unified VRAM

- Arch Linux, ROCm 7.2.2 via pacman

- Mesa RADV Vulkan driver

Model: Qwen3.6-35B-A3B (MoE, Q6_K quantized, ~30GB)

llama.cpp: commit 27aef3dd9

Flags: -ngl 99 -p 512 -n 128 -t 8 -fa 1 -b 2048 -ub 512

Results (tokens/sec):

| Backend | pp512 | tg128 | Std Dev |

|---------|-------|-------|---------|

| ROCm0 | 841 | 42.3 | ±1.8 |

| Vulkan0 | 867 | 51.2 | ±0.5 |

Vulkan is ~21% faster at token generation and more stable (lower variance). Prompt processing is roughly equal.

I built both backends into the same binary (`-DGGML_HIP=ON -DGGML_VULKAN=ON`). Using `-dev Vulkan0` gives better results than ROCm for this workload.

Curious if anyone else on Strix Halo or other RDNA3.5 chips has seen the same thing. ROCm seems to fall back to slower code paths for certain ops on this GPU.

submitted by /u/FeiX7
[link] [comments]

Leave a Comment