cs.CV, cs.LG

Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance

arXiv:2605.01325v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) have enhanced traditional LLMs with visual capabilities through the integration of vision encoders. While recent works have explored various combinations of vision encoders …