cs.CV

SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning

arXiv:2604.21190v1 Announce Type: new
Abstract: Understanding visual scenes requires not only recognizing objects but also reasoning about their spatial relationships. Unlike general vision-language tasks, spatial reasoning requires integrating multip…