Shaoguang Wang, Weiyu Guo, Ziyang Chen, Xuming Hu, Hui Xiong

Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding

Shaoguang Wang, Weiyu Guo, Ziyang Chen, Xuming Hu, Hui Xiong / April 21, 2026

arXiv:2604.17422v1 Announce Type: new
Abstract: Long video understanding remains a formidable challenge for Multimodal Large Language Models (MLLMs) due to the prohibitive computational cost of processing dense frame sequences. Prevailing solutions, w…

Author name: Shaoguang Wang, Weiyu Guo, Ziyang Chen, Xuming Hu, Hui Xiong

Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding