cs.CV

GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking

arXiv:2602.17555v3 Announce Type: replace
Abstract: Video reasoning requires a fine-grained understanding of the temporal dependencies and event-level relations between objects and events in videos. Current Multimodal Large Language Models (MLLMs) are…