Fix: Dual Intel Arc GPUs using all system RAM during inference – found the cause and a working fix (llama.cpp SYCL)
If you're running dual Intel Arc GPUs with llama.cpp and your system RAM maxes out during multi-GPU inference, even though the model fits in VRAM, this post explains why and how to fix it. I've been running dual Arc Pro B70s (32GB each, 64GB to…