Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
arXiv:2604.11177v1 Announce Type: new
Abstract: We benchmark how internal reasoning traces, which we call thought streams, affect video scene understanding in vision-language models. Using four configurations of Google’s Gemini 2.5 Flash and Flash Lit…