Seunghwan Bang, Hwanjun Song

Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence

Seunghwan Bang, Hwanjun Song / April 20, 2026

arXiv:2603.13091v2 Announce Type: replace
Abstract: The growing interest in embodied agents increases the demand for spatiotemporal video understanding, yet existing benchmarks largely emphasize extractive reasoning, where answers can be explicitly pr…

Author name: Seunghwan Bang, Hwanjun Song

Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence