TRIO: Token Reduction via Inference-Objective Guidance for Efficient Vision-Language Models
arXiv:2602.04657v3 Announce Type: replace
Abstract: Recently, reducing redundant visual tokens in vision-language models (VLMs) to accelerate VLM inference has emerged as a hot topic. However, most existing methods rely on heuristics constructed based…