Ujjwal Upadhyay, Mukul Ranjan, Zhiqiang Shen, Mohamed Elhoseiny

Time Blindness: Why Video-Language Models Can’t See What Humans Can?

Ujjwal Upadhyay, Mukul Ranjan, Zhiqiang Shen, Mohamed Elhoseiny / April 30, 2026

arXiv:2505.24867v2 Announce Type: replace-cross
Abstract: Recent advances in vision-language models (VLMs) have made impressive strides in understanding spatio-temporal relationships in videos. However, when spatial information is obscured, these mode…

Author name: Ujjwal Upadhyay, Mukul Ranjan, Zhiqiang Shen, Mohamed Elhoseiny

Time Blindness: Why Video-Language Models Can’t See What Humans Can?