To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs
arXiv:2603.18373v2 Announce Type: replace
Abstract: When VLMs answer correctly, do they genuinely rely on visual information or exploit language shortcuts? We introduce the Tri-Layer Diagnostic Framework, which disentangles hallucination sources via t…