SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
arXiv:2604.05079v1 Announce Type: new
Abstract: Video question answering (VideoQA) is a challenging task that requires integrating spatial, temporal, and semantic information to capture the complex dynamics of video sequences. Although recent advances…