cs.AI, cs.CL

Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting

arXiv:2506.19089v5 Announce Type: replace
Abstract: We introduce StorySim, a programmable framework for synthetically generating stories to evaluate the theory of mind (ToM) and world modeling (WM) capabilities of large language models (LLMs). Unlike …