GLM 5.1 sits alongside frontier models in my social reasoning benchmark
Still need more matches for reliable data but GLM 5.1 looks to be very competitive with other frontier models. This uses a benchmark I made that pits LLMs against each other in autonomous games of Blood on the Clocktower (a complex social deducti…