Gemma 4 models feel very different depending on size (26B vs 31B)

I spent a few hours trying out the new Gemma 4 models, and one thing that stood out pretty quickly — the difference between sizes is more noticeable than I expected.

Didn’t run any formal benchmarks, just hands-on usage.

Tested:

Gemma-4-26B-A4B-it
Gemma-4-31B-it

Mostly used them for:

some coding (Python + small scripts)
general prompts
a bit of longer / slightly more complex instructions

🧠 31B (Gemma-4-31B-it)

This one feels a lot more stable once prompts get even a little complex.

Better at following multi-step instructions
Less likely to drift or “lose the thread”
Coding outputs were more consistent

For simple stuff, it doesn’t feel massively different. But as soon as you stack a few requirements together, the gap shows up pretty clearly.

Downside is just what you’d expect: slower and more expensive.

⚡ 26B (Gemma-4-26B-A4B-it)

This one actually surprised me.

Very fast and responsive
Totally fine for most day-to-day use
Feels good for quick testing / iteration

It does start to break down a bit on more layered prompts or when you need tighter reasoning, but nothing unexpected.

I ran both in a hosted notebook setup just to save time on local config.

Curious if others are seeing the same kind of gap, or if this depends a lot on the setup/use case.

submitted by /u/still_debugging_note
[link] [comments]

Leave a Comment