Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models
arXiv:2605.09429v1 Announce Type: cross
Abstract: Are low-attention visual tokens truly redundant in vision-language reasoning? Existing pruning methods often assume so, ranking visual tokens by shallow text-to-image attention and discarding low-scori…