cs.AI, cs.CV

LensWalk: Agentic Video Understanding by Planning How You See in Videos

arXiv:2603.24558v1 Announce Type: cross
Abstract: The dense, temporal nature of video presents a profound challenge for automated analysis. Despite the use of powerful Vision-Language Models, prevailing methods for video understanding are limited by t…