Sangin Lee, Yukyung Choi

CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models

Sangin Lee, Yukyung Choi / May 14, 2026

arXiv:2605.13178v1 Announce Type: cross
Abstract: In large vision-language models, visual tokens typically constitute the majority of input tokens, leading to substantial computational overhead. To address this, recent studies have explored pruning re…

Author name: Sangin Lee, Yukyung Choi

CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models