Jialuo Li, Bin Li, Jiahao Li, Yan Lu

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

Jialuo Li, Bin Li, Jiahao Li, Yan Lu / March 26, 2026

arXiv:2512.04000v2 Announce Type: replace-cross
Abstract: The application of Large Multimodal Models (LMMs) to long-form video understanding is constrained by limited context lengths and the computationally prohibitive cost of processing dense video t…

Author name: Jialuo Li, Bin Li, Jiahao Li, Yan Lu

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding