Bartowski has uploaded new quants for Gemma 4. I've downloaded them for 26B and E4B.
Compared to his original release I'm getting about half the tg/s for both of them. 75% of the pp/s.
Does anyone know what changed? I'm assuming the weights aren't the problem but maybe the gguf header now enables a llama.cpp feature that my hardware dislikes?
Thanks for any information!
[link] [comments]