GLM 5.1 sits alongside frontier models in my social reasoning benchmark

GLM 5.1 sits alongside frontier models in my social reasoning benchmark

Still need more matches for reliable data but GLM 5.1 looks to be very competitive with other frontier models.

This uses a benchmark I made that pits LLMs against each other in autonomous games of Blood on the Clocktower (a complex social deduction game) - last screenshot shows GLM 5.1 playing as the evil team (red).

For contrast,
Claude Opus 4.6 costs $3.69 per game.
GLM 5.1 costs $0.92 per game.

With a 0% tool error rate.

Very impressive.

submitted by /u/cjami
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top