Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning
arXiv:2512.05513v3 Announce Type: replace
Abstract: Large Video-Language Models (Video-LMs) have achieved impressive progress in multimodal understanding, yet their reasoning remains weakly grounded in space and time. We present Know-Show, a new bench…