Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs.

I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.

To test that instinct, I ran both models (31B Dense and 26B A4B MoE) through 8 real-world tasks — not benchmarks, actual prompts I'd use at work. Shared everything so you can run the same tests yourself:

- All 8 prompts, copy-paste ready

- Full model outputs for the longer tests

- Demo app source (single HTML file, just needs a free AI Studio key)

Results verified by Gemini 3.1 Pro and Claude Opus 4.6 independently.

https://github.com/useaitechdad/explore-gemma4

*Note: I ran these tests via Genai API (Gemma 4 hosted on GCP), not locally. A friend runs the 31B locally and reports similar performance, but these specific tests were cloud-run. *

submitted by /u/grassxyz
[link] [comments]

Leave a Comment