New Bartowski Gemma 4 quants are a lot slower?

By /u/Top-Rub-4670 / April 11, 2026

Bartowski has uploaded new quants for Gemma 4. I've downloaded them for 26B and E4B.

Compared to his original release I'm getting about half the tg/s for both of them. 75% of the pp/s.

Does anyone know what changed? I'm assuming the weights aren't the problem but maybe the gguf header now enables a llama.cpp feature that my hardware dislikes?

Thanks for any information!

submitted by /u/Top-Rub-4670
[link] [comments]

Leave a Comment