PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking
arXiv:2604.17819v1 Announce Type: new
Abstract: Large language models (LLMs) perform substantially below human level on existing theory-of-mind (ToM) benchmarks, even when augmented with chain-of-thought prompting or probabilistic belief updates. We a…