Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs
arXiv:2603.25711v1 Announce Type: new
Abstract: Multimodal Diffusion Large Language Models (MDLLMs) achieve high-concurrency generation through parallel masked decoding, yet the architectures remain prone to multimodal hallucinations. This structural …