Gemma 4 26B-A4B GGUF Benchmarks

Hey r/LocalLLaMA we conducted KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers to help you pick the best quant.

Mean KL Divergence puts nearly all Unsloth GGUFs on the Pareto frontier
KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy.
This makes Unsloth the top-performing in 21 of 22 sizes. Similar trend for 99.9% KLD and others.
We also updated our Q6_K quants to be more dynamic. Previously, they were optimized, just now they're a bit better - no need to re-download though - it's up to you if you want a slightly better version. The previous quant was perfectly fine but this one is slightly bigger. The same was done for Qwen3.6.
We're also introducing a new UD-IQ4_NL_XL quant that fits in 16GB VRAM. UD-IQ4_NL_XL (14.6GB) sits between UD-IQ4_XS (13.4GB) and UD-Q4_K_S (16.4GB). The same was done for Qwen3.6.

For HQ versions of the graphs as Reddit mobile compresses it. See: Gemma 4 Benchmarks and Qwen3.6 Benchmarks

We also updated our MLX quants to be more dynamic with better layering selection (there are limitations due to MLX): See here

MLX Metrics	UD-4bit (Old)	UD-4bit (New)	MLX 4.4bit MSQ
Perplexity	4.772	4.766	4.864
Mean KLD	0.0177	0.0163	0.0878
99.9% KLD	0.8901	0.8398	2.9597
Disk Sze	21.4 GB	21.6 GB	21.2 GB