MiMo-V2.5-Pro – the actual best open-weights model

Following an impressive shake-up by Kimi K2.6, I've now got some results for Xiaomi's MiMo-V2.5-Pro.

For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game. If you're unfamiliar, it's like Mafia/Werewolf or The Traitors TV show.

MiMo-V2.5-Pro joins Kimi K2.6 as another dominant player, both models pulling away from the crowd in their own class. Note I have not yet benched GPT 5.5 (Xhigh) or Claude Opus 4.7 (Max) that may also be in this area.

Interestingly, its win rate is a bit lop-sided (Good 88%/ Evil 48%) - having a extremely high good team win rating but a poorer evil team win rating that holds it back from being the top.

Why MiMo-V2.5-Pro over Kimi K2.6?

Kimi K2.6 has incredibly verbose reasoning at 580,000 average output tokens per game, leading to a $2.65/game cost - this also leads to long response times, matches taking around 10-15 hours to complete. It feels a bit impractical for many use cases.

MiMo-V2.5-Pro on the other hand, while slightly verbose at 183,639 tokens per game (similar to Gemini 3.1 Pro verbosity), costs less than half as much at a cooler $0.99/game. On the high end, Claude Opus 4.6 costs $3.76/game. Matches also usually finish around a typical 2-3 hours (if not vs kimi).

It is also fairly reliable with a 0.4% tool call error rate.

This currently places it as the best value model at the top-end of the group.

Notable moves: