Tango: Taming Visual Signals for Efficient Video Large Language Models
arXiv:2604.09547v2 Announce Type: replace
Abstract: Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advances the two predominant token-pruning paradigms: atte…