Jianxiang He, Meisheng Hong, Jungang Li, Weiyu Guo, Xuming Hu, Hui Xiong

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

Jianxiang He, Meisheng Hong, Jungang Li, Weiyu Guo, Xuming Hu, Hui Xiong / April 13, 2026

arXiv:2508.06869v4 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) demonstrate exceptional performance in vision-language tasks, yet their processing of long videos is constrained by input context length and high computationa…

Author name: Jianxiang He, Meisheng Hong, Jungang Li, Weiyu Guo, Xuming Hu, Hui Xiong

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding