cs.AI, cs.CL, cs.CV, cs.LG

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

arXiv:2604.14888v1 Announce Type: cross
Abstract: Recent advances in vision language models (VLMs) offer reasoning capabilities, yet how these unfold and integrate visual and textual information remains unclear. We analyze reasoning dynamics in 18 VLM…