AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning
arXiv:2509.25699v3 Announce Type: replace
Abstract: Interleaved-Modal Chain-of-Thought (I-MCoT) advances vision-language reasoning, such as Visual Question Answering (VQA). This paradigm integrates specially selected visual evidence from the input ima…