Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
arXiv:2505.03821v2 Announce Type: replace
Abstract: We investigate the ability of Vision Language Models (VLMs) to perform visual perspective taking using a new set of visual tasks inspired by established human tests. Our approach leverages carefully …