cs.CV

Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection

arXiv:2604.17126v1 Announce Type: new
Abstract: Vision-language models enable open-vocabulary object grounding through natural language queries, under the implicit assumption that semantically equivalent descriptions yield consistent outputs. We exami…