Contextual inference from single objects in Vision-Language models
arXiv:2603.26731v1 Announce Type: new
Abstract: How much scene context a single object carries is a well-studied question in human scene perception, yet how this capacity is organized in vision-language models (VLMs) remains poorly understood, with di…