Author name: /u/Katostrofik

Fix: Dual Intel Arc GPUs using all system RAM during inference – found the cause and a working fix (llama.cpp SYCL)

/u/Katostrofik / April 8, 2026

If you're running dual Intel Arc GPUs with llama.cpp and your system RAM maxes out during multi-GPU inference, even though the model fits in VRAM, this post explains why and how to fix it. I've been running dual Arc Pro B70s (32GB each, 64GB to…

LocalLLaMA

[llama.cpp] 3.1x Q8_0 speedup on Intel Arc GPUs – reorder optimization fix (PR submitted)

/u/Katostrofik / April 6, 2026

TL;DR: Q8_0 quantization on Intel Xe2 (Battlemage/Arc B-series) GPUs was achieving only 21% of theoretical memory bandwidth. My AI Agent and I found the root cause and submitted a fix that brings it to 66% – a 3.1x speedup in token generation. The prob…