EchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMs
arXiv:2605.10050v1 Announce Type: new
Abstract: Long-form video understanding remains challenging for Video Large Language Models (VideoLLMs), as the dense frame sampling introduces massive visual tokens while sparse sampling risks missing critical te…