cs.CV

EGM: Efficient Visual Grounding Language Models

arXiv:2601.13633v3 Announce Type: replace
Abstract: Visual grounding is an essential capability of Visual Language Models (VLMs) to understand the real physical world. Previous state-of-the-art grounding visual language models usually have large model…