cs.CV

Progressive Video Condensation with MLLM Agent for Long-form Video Understanding

arXiv:2604.02891v1 Announce Type: new
Abstract: Understanding long videos requires extracting query-relevant information from long sequences under tight compute budgets. Existing text-then-LLM pipelines lose fine-grained visual cues, while video-based…