cs.CV

CREG: Compass Relational Evidence Graph for Characterizing Directional Structure in VLM Spatial-Reasoning Attribution

arXiv:2603.20475v2 Announce Type: replace
Abstract: Vision-language models (VLMs) can answer spatial relation queries, yet a correct answer does not reveal whether the model truly uses directional evidence or merely exploits object layout. We present …