Hey r/LocalLLaMA we conducted KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers to help you pick the best quant.
- Mean KL Divergence puts nearly all Unsloth GGUFs on the Pareto frontier
- KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy.
- This makes Unsloth the top-performing in 21 of 22 sizes. Similar trend for 99.9% KLD and others.
- We also updated our Q6_K quants to be more dynamic. Previously, they were optimized, just now they're a bit better - no need to re-download though - it's up to you if you want a slightly better version. The previous quant was perfectly fine but this one is slightly bigger. The same was done for Qwen3.6.
- We're also introducing a new UD-IQ4_NL_XL quant that fits in 16GB VRAM. UD-IQ4_NL_XL (14.6GB) sits between UD-IQ4_XS (13.4GB) and UD-Q4_K_S (16.4GB). The same was done for Qwen3.6.
For HQ versions of the graphs as Reddit mobile compresses it. See: Gemma 4 Benchmarks and Qwen3.6 Benchmarks
We also updated our MLX quants to be more dynamic with better layering selection (there are limitations due to MLX): See here
| MLX Metrics | UD-4bit (Old) | UD-4bit (New) | MLX 4.4bit MSQ |
| Perplexity | 4.772 | 4.766 | 4.864 |
| Mean KLD | 0.0177 | 0.0163 | 0.0878 |
| 99.9% KLD | 0.8901 | 0.8398 | 2.9597 |
| Disk Sze | 21.4 GB | 21.6 GB | 21.2 GB |
Gemma 4 GGUFs: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
Qwen3.6 GGUFs: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
submitted by