Is it normal for Gemma 4 26B/31B to run this fast on an Intel laptop? (288V / CachyOS)

Hey everyone, I just got into local LLMs about a week ago. I tried Ollama and LMStudio on my Core Ultra 9 288V, but they kept failing or giving me "hard stops" on the MoE models, so I figured I’d just try building the environment myself.

I couldn’t get OpenVINO to play nice with the NPU for these larger models yet, so I just compiled a custom Vulkan bridge for the GPU instead. It seems to be working?

Performance Stats:

Model: Gemma-4-26B-it-i1 (GGUF)
Speed: 7-12 t/s (16k context)
Hardware Use: 95-100% GPU, 10-40% CPU, 20-24GB RAM.

I also tried the 31B-it-i1-Q4_K_M.gguf version. It's a bit heavier but still totally usable:

Speed: Decent/Fluid (4-8k context)
Hardware Use: 100% GPU, ~30-60% CPU (Xe2 and the logic cores seems to be sharing the load well).
RAM: Pushing 26GB out of 29GB free, but 0GB swap used so far.

Is this a normal result for integrated graphics? I only got it working on the CPU at first which was faster although unsustainable, but once the Vulkan bridge was built, it is balanced. I'm using CachyOS if that makes a difference.

Just wanted to see if I’m missing something or if Intel Lunar Lake is actually this cracked for local MoE.

submitted by /u/No-Key8555
[link] [comments]

Leave a Comment