ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning
arXiv:2605.09982v1 Announce Type: new
Abstract: Recent advancements in Vision-Language Models (VLMs) enable large language models (LLMs) to process high-resolution images, significantly improving real-world multimodal understanding. However, this capa…