From Attenuation to Attention: Variational Information Flow Manipulation for Fine-Grained Visual Perception
arXiv:2604.12508v1 Announce Type: new
Abstract: While Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in general visual understanding, they frequently falter in fine-grained perception tasks that require identifying …