cs.AI, cs.CL, cs.CV, cs.IR, cs.LG

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

arXiv:2512.02425v2 Announce Type: replace
Abstract: Recent advances in video large language models have demonstrated strong capabilities in understanding short clips. However, scaling them to hours- or days-long videos remains highly challenging due t…