Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap
arXiv:2604.16256v1 Announce Type: cross
Abstract: Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks. However, it remains unclear whether the superior …