MagicQuant v2.0 is here. Introducing hybrid GGUF mixed models, utilization of learned Unsloth Dynamic tensors, a new benchmark philosophy that skips the nonsense!
Smaller files. Better KLD trades.
MagicQuant is not a quantization method.
It benchmarks GGUF models, learns their tensor behavior, and explores mixed configurations to uncover non-linear trade wins.
Sometimes that means a hybrid beats a baseline.
Sometimes the baseline already wins.
Sometimes nothing new survives.
That’s the point.
MagicQuant doesn’t push hybrids, it identifies the models that actually deserve to exist at a given size/fidelity tradeoff. Survivors can be llama.cpp, Unsloth, hybrids, or anything else that proves itself.
If you used v1.0 (MXFP4 era), check the docs for why it was deprecated and what changed:
https://github.com/magiccodingman/MagicQuant-Wiki/blob/main/archival/version_1/README.md - This goes over v1.0 flaws, lessons, and more.
Examples
Let's jump into the juicy stuff now! The following table is in reference to the published Qwen3-4B-Instruct-2507 model posted on Huggingface. Yes, I know it's not the new fun versions, but it is my stable test baseline.
| Name | Provider | Quant Family | KLD | Size (GB) |
|---|---|---|---|---|
| LM-Q8_0 | llama.cpp | Q8_0 | 0.001339 | 3.99 |
| MQ-Q6_K_1 | MagicQuant | Q6_K | 0.001817 | 3.58 |
| UD-Q6_K_XL | Unsloth | UD-Q6_K_XL | 0.002111 | 3.41 |
| LM-Q6_K | llama.cpp | Q6_K | 0.004640 | 3.08 |
| <u>MQ-Q5_K_1</u> | MagicQuant | Q5_K | 0.006632 | 2.88 |
| <u>UD-Q5_K_XL</u> | Unsloth | UD-Q5_K_XL | 0.009839 | 2.73 |
| <u>MQ-Q4_K_M_1</u> | MagicQuant | Q4_K_M | 0.020346 | 2.44 |
| <u>LM-Q4_K_S</u> | llama.cpp | Q4_K_S | 0.029803 | 2.22 |
| LM-IQ4_XS | llama.cpp | IQ4_XS | 0.031300 | 2.11 |
| UD-Q3_K_XL | Unsloth | UD-Q3_K_XL | 0.072278 | 1.98 |
LM- Llama.cpp quantizedUD- Unsloth DynamicMQ- MagicQuant hybrid
This is what the new MagicQuant results look like. Presented is quantized models from llama.cpp, learned Unsloth Dynamic tensor configurations, and MagicQuant hybrids.
Shown in this table, 3 MagicQuant hybrids made it through the final survival candidates. 2 of which, MQ-Q6_K_1 and MQ-5_K_1 both were discovered as non linear trade wins (good read if you want to understand why "nonlinear good trades" is an important distinction) between their quant family bit space.
Now looking at MQ-Q4_K_M_1 wasn't an in-between space winner, but what's called a "dominance" winner. This means it collapsed and removed UD-Q4_K_XL, LM-Q4_K_M, and other found MagicQuant hybrids as well.
Being a dominant quant does not only apply to hybrids. You can see that Unsloth Dynamic Q5_K_XL collapsed M-Q5_K and LM-Q5_K_S. Just like LM-Q4_K_S collapsed LM-IQ4_NL
The 3 shown MagicQuant Hybrids actually utilized the following configurations:
| Name | embeddings | attn_q | attn_kv | attn_output | ffn_up_gate | ffn_down |
|---|---|---|---|---|---|---|
| MQ-Q6_K_1 | Q8_0 | Q8_0 | Q8_0 | Q8_0 | Q6_K | Q8_0 |
| MQ-Q5_K_1 | Q8_0 | Q5_K | Q8_0 | Q6_K | UD-Q5_K_XL | Q5_K_S |
| MQ-Q4_K_M_1 | Q8_0 | Q5_K | Q8_0 | Q6_K | IQ4_XS | IQ4_XS |
I thought the MQ-Q4_K_M_1 result was cool. It's kind of like the pipeline said, "protect the brain, compress the muscle".
As you can see from the table, new MagicQuant v2.0 detects what groups are sensitive, what learned configs, whether from llama.cpp, Unsloth, or others, work best for each tensor group, and it makes real decisions through automated means to predict and build these hybrid models.
Also note that all MagicQuant repos are published with json files that map additional information for those who want to better understand or look under the hood.
What's Different in v2.0
v2.0 is a full reset, new framework, new philosophy, new everything. v1.0 existed to teach me why it had to be thrown out.
The goal is simple: I’m tired of downloading Q8/Q6/Q5/Q4 and blindly guessing. I don’t want “feels better.” I want to know what’s actually worth the trade and trust what I’m downloading.
MagicQuant is the judge, not the quantization. It learns from existing tensor configs (llama.cpp, Unsloth, etc.), it does not make tensor-by-tensor decisions. I leave that to the people already pushing that frontier.
If you're interested in understanding how MagicQuant learns from an Unsloth model for example (not that it's limited to Unsloth), rips the quant configuration, and applies it in an isolated equal footing environment read the documentation here. Because it's simpler than you'd think!
What I care about is: which tensor configurations actually win.
If an Unsloth model wins, I link directly to them. I’m not here to re-host or play “did I beat Unsloth?” especially when they’re doing legitimately cool things with imatrix and beyond. I’m not competing there.
I am competing on one thing:
proving what’s worth it.
Now, just a note, if it's an abliterated version of a model, or something Unsloth doesn't host, I will re-host their quant UD configuration since it's not on an Unsloth repo directly.
Nonlinear wins
MagicQuant does not look for simple "winners" in sub space between baselines. Instead it only allows nonlinear trade wins. Documentation presented later goes further into detail on this subject, but here's the TLDR:
Imagine a graph like this: Size → | | Q6 | / | / | Q5 | / |Q4 +----------------
A nonlinear win looks like: Q6 / / ← MQ-Q5_K_1 (above the line) Q5 / Q4
That hybrid sits above the straight line between Q4 and Q5.
Meaning: 👉 It’s a more efficient trade than the normal step-up
This is what MagicQuant calls a "nonlinear trade/win" when such wordage is used.
Final
The new version I built is significantly stronger, more trustworthy, transparent, and isn't as prototype level backyard moonshine level science lab setup anymore. I worked hard to make something genuinely useful, transparent, and production ready. Something that's not biased and finally hits more of my end goal.
Currently I only have a single model published showcasing MagicQuant v2.0 but I have the newest Qwen3.6 series baking right now. Thank you to everyone who has helped me, provided feedback, and taught me along the way ever since the weird MXFP4 prototype days.
My GitHub wiki has lots of documentation! How the prediction engine works. How I collapse 51M combinatoric spaces to sub 350 combos, and much more. I'm still working on a few more docs just as an FYI. GitHub Wiki - Where you can make requests, provide more feedback, etc
For anyone interested, I’m keeping the models here: Huggingface Collection
I'm excited to get some of the newer Qwen3.6 models through the pipeline. I'll likely do the Gemma models after.
[link] [comments]