| Oobabooga recently published 5 detailed benchmark reports on GGUF performance for Gemma 4 26B-A4B, Gemma 4 E4B, Qwen3.5-35-A3B, and Qwen3.5-27B, covering releases from Unsloth, Bartowski, LM Studio, GGML, Mradermacher, AesSedai, and Ubergarm. The benchmark methodology is based on KL Divergence using a dataset of around 250,000 tokens across six categories: coding, general chat, tool calling, science, non-Latin scripts, and long documents. That gives a much clearer picture of real chat performance than benchmarks based only on wikitext. You can find the reports and results here: https://localbench.substack.com/ (the 31B analysis is free to read. Running these benchmarks takes a lot of time and money, so it’s worth supporting oobabooga if you find the work useful.) I think oobabooga will probably make one or two of the paid reports free from time to time. The Gemma 4 26B-A4B, Gemma 4 E4B, and similar reports are incredibly thorough, but they can also be a bit confusing because of how many quants are tested. Each report covers roughly 70 to 90 GGUF quants, which is insane but overall they are very detailed and valuable which makes it very fun to read. Am very keen to see more reports down the line. [link] [comments] |