MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark
arXiv:2601.02536v2 Announce Type: replace
Abstract: Understanding real-world videos such as movies requires integrating visual and dialogue cues. Yet existing VideoQA benchmarks struggle to capture this multimodal reasoning and, given the difficulty o…