Visuospatial Perspective Taking in Multimodal Language Models
arXiv:2603.23510v1 Announce Type: cross
Abstract: As multimodal language models (MLMs) are increasingly used in social and collaborative settings, it is crucial to evaluate their perspective-taking abilities. Existing benchmarks largely rely on text-b…