cs.CV, cs.LG

Information Router for Mitigating Modality Dominance in Vision-Language Models

arXiv:2604.16264v1 Announce Type: cross
Abstract: Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, where predictions rely disproportionately on a singl…