cs.AI, cs.CV

Contextual inference from single objects in Vision-Language models

arXiv:2603.26731v1 Announce Type: new
Abstract: How much scene context a single object carries is a well-studied question in human scene perception, yet how this capacity is organized in vision-language models (VLMs) remains poorly understood, with di…