DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning
arXiv:2604.18829v1 Announce Type: new
Abstract: Multimodal large language models (MLLMs) have achieved impressive performance on visual perception and reasoning tasks with RGB imagery, yet they remain fragile under common degradations, such as fog, bl…