Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B

By /u/WeGoToMars7 / April 17, 2026

I'm using the https://github.com/PrismML-Eng/llama.cpp fork for Bonsai, regular llama.cpp for Gemma.

Without embedding parameters:
Gemma 4 has 2.3B at 4.8 bpw (Q4_K_M) = 1104 MB
Bonsai-8B has 6.95B at 1.125 bpw (Q1_0) = 782 MB (-29% smaller)

I could've went with a smaller quant of Gemma 4, it's just conventional wisdom to not push small models beyond Q4_K_M.

I might try their ternary model later, but I don't have much hope...