cs.AI, cs.CV

Time Blindness: Why Video-Language Models Can’t See What Humans Can?

arXiv:2505.24867v2 Announce Type: replace-cross
Abstract: Recent advances in vision-language models (VLMs) have made impressive strides in understanding spatio-temporal relationships in videos. However, when spatial information is obscured, these mode…