cs.CV

Towards Joint Quantization and Token Pruning of Vision-Language Models

arXiv:2604.17320v1 Announce Type: new
Abstract: Deploying Vision-Language Models (VLMs) under aggressive low-bit inference remains challenging because inference cost is dominated by the long visual-token prefix during prefill and the growing KV cache …