VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
arXiv:2512.12360v2 Announce Type: replace-cross
Abstract: Long-form video understanding remains challenging due to the extended temporal structure and dense multimodal cues. Despite recent progress, many existing approaches still rely on hand-crafted …