Ashish Baghel, Paras Chopra

See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay

Ashish Baghel, Paras Chopra / March 30, 2026

arXiv:2603.11601v2 Announce Type: replace
Abstract: Vision-Language Models (VLMs) excel at describing visual scenes, yet struggle to translate perception into precise, grounded actions. We investigate whether providing VLMs with both the visual frame …

Author name: Ashish Baghel, Paras Chopra

See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay