AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding
arXiv:2603.28696v1 Announce Type: new
Abstract: Long video understanding remains challenging for Multi-modal Large Language Models (MLLMs) due to high memory costs and context-length limits. Prior approaches mitigate this by scoring and selecting fram…