Author name: Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune

Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference

Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune / April 17, 2026

arXiv:2604.11496v2 Announce Type: replace-cross
Abstract: Dual-encoder Vision-Language Models (VLMs) such as CLIP are often characterized as bag-of-words systems due to their poor performance on compositional benchmarks. We argue that this limitation …

cs.CL, cs.CV, cs.LG

Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference

Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune / April 14, 2026

arXiv:2604.11496v1 Announce Type: cross
Abstract: Dual-encoder Vision-Language Models (VLMs) such as CLIP are often characterized as bag-of-words systems due to their poor performance on compositional benchmarks. We argue that this limitation may stem…