Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez

Beyond Referring Expressions: Scenario Comprehension Visual Grounding

Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez / April 3, 2026

arXiv:2604.02323v1 Announce Type: new
Abstract: Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explor…

Author name: Ruozhen He, Nisarg A. Shah, Qihua Dong, Zilin Xiao, Jaywon Koo, Vicente Ordonez

Beyond Referring Expressions: Scenario Comprehension Visual Grounding