Yashwant Pravinrao Bangde, Debaditya Roy

Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning

Yashwant Pravinrao Bangde, Debaditya Roy / April 29, 2026

arXiv:2604.25809v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent outputs that are weakly grounded in visual …

Author name: Yashwant Pravinrao Bangde, Debaditya Roy

Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning