Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow
arXiv:2604.15809v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) have demonstrated strong capability in a wide range of tasks such as visual recognition, document parsing, and visual grounding. Nevertheless, recent work shows that while V…