Chenjun Li - Provide.ai

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks

Chenjun Li / May 11, 2026

arXiv:2603.04676v2 Announce Type: replace
Abstract: Multi-image reasoning remains a significant challenge for vision-language models (VLMs). We investigate a previously overlooked phenomenon: during chain-of-thought (CoT) generation, the text-to-image…

Author name: Chenjun Li

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks