Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting
arXiv:2506.19089v5 Announce Type: replace
Abstract: We introduce StorySim, a programmable framework for synthetically generating stories to evaluate the theory of mind (ToM) and world modeling (WM) capabilities of large language models (LLMs). Unlike …