When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs
arXiv:2605.11559v1 Announce Type: new
Abstract: Multimodal large language models (MLLMs) have become a key interface for visual reasoning and grounded question answering, yet they remain vulnerable to visual hallucinations, where generated responses c…