cs.CV

Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents

arXiv:2509.24943v2 Announce Type: replace
Abstract: Long videos, characterized by temporal complexity and sparse task-relevant information, pose significant reasoning challenges for AI systems. Although existing Large Language Model (LLM)-based approa…