cs.CV

SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding

arXiv:2605.08412v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have made rapid progress in single-video understanding, yet their ability to reason across multiple independent video streams remains poorly understood. Existing …