Jiayun Luo, Mir Rayat Imtiaz Hossain, Pritam Sarkar, Boyang Li, Leonid Sigal

The ART of Composition: Attention-Regularized Training for Compositional Visual Grounding

Jiayun Luo, Mir Rayat Imtiaz Hossain, Pritam Sarkar, Boyang Li, Leonid Sigal / May 8, 2026

arXiv:2412.08110v3 Announce Type: replace-cross
Abstract: Vision-Language Models (VLMs) have achieved strong performance on implicit and explicit visual grounding and related tasks. However, such abilities are generally tested on simple, single-object…

Author name: Jiayun Luo, Mir Rayat Imtiaz Hossain, Pritam Sarkar, Boyang Li, Leonid Sigal

The ART of Composition: Attention-Regularized Training for Compositional Visual Grounding