VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding
arXiv:2605.05848v2 Announce Type: replace
Abstract: Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long visual-token sequences, which sharply increase memory and latency during inference. Whil…