Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
arXiv:2506.01097v2 Announce Type: replace
Abstract: Existing Multimodal Large Language Models (MLLMs) process a large number of visual tokens, leading to significant computational costs and inefficiency. Instruction-related visual token compression de…