cs.LG

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

arXiv:2604.13313v1 Announce Type: new
Abstract: Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word order and attribute binding. This limitation arises f…