cs.AI, cs.CV

Beyond Cross-Modal Alignment: Measuring and Leveraging Modality Gap in Vision-Language Models

arXiv:2502.14888v4 Announce Type: replace-cross
Abstract: The success of vision-language models is primarily attributed to effective alignment across modalities such as vision and language. However, modality gaps persist in existing alignment algorith…