Evaluating Language Models’ Evaluations of Games
arXiv:2510.10930v2 Announce Type: replace
Abstract: Reasoning is not just about solving problems — it is also about evaluating which problems are worth solving at all. Evaluations of artificial intelligence (AI) systems primarily focused on problem s…