Collaborative Multi-Mode Pruning for Vision-Language Models
arXiv:2604.02956v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) have advanced rapidly within the unified Transformer architecture, yet their deployment on resource-constrained devices remains challenging due to high computational complex…