LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
arXiv:2604.23950v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) have recently demonstrated remarkable capabilities in visual understanding and reasoning, but they also impose significant computational burdens due to long visual sequence …