WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
arXiv:2512.02425v2 Announce Type: replace
Abstract: Recent advances in video large language models have demonstrated strong capabilities in understanding short clips. However, scaling them to hours- or days-long videos remains highly challenging due t…