Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run

Tested Gemma 4 (31B) on our benchmark. Genuinely did not expect this.

100% survival, 5 out of 5 runs profitable, +1,144% median ROI. At $0.20 per run.

It outperforms GPT-5.2 ($4.43/run), Gemini 3 Pro ($2.95/run), Sonnet 4.6 ($7.90/run), and absolutely destroys every Chinese open-source model we've tested — Qwen 3.5 397B, Qwen 3.5 9B, DeepSeek V3.2, GLM-5. None of them even survive consistently.

The only model that beats Gemma 4 is Opus 4.6 at $36 per run. That's 180× more expensive.

31 billion parameters. Twenty cents. We double-checked the config, the prompt, the model ID — everything is identical to every other model on the leaderboard. Same seed, same tools, same simulation. It's just genuinely this good.

Strongly recommend trying it for your agentic workflows. We've tested 22 models so far and this is by far the best cost-to-performance ratio we've ever seen.

Full breakdown with charts and day-by-day analysis: foodtruckbench.com/blog/gemma-4-31b

FoodTruck Bench is an AI business simulation benchmark — the agent runs a food truck for 30 days, making decisions about location, menu, pricing, staff, and inventory. Leaderboard at foodtruckbench.com

submitted by /u/Disastrous_Theme5906
[link] [comments]

Leave a Comment