Xinhao Wang, Zhonyu Xia, Zhiwei Lin, Zhe Li, Yongtao Wang

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models

Xinhao Wang, Zhonyu Xia, Zhiwei Lin, Zhe Li, Yongtao Wang / April 6, 2026

arXiv:2604.02816v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have shown strong reasoning ability, but their high computational and memory costs hinder deployment in resource-constrained settings. While Post-Training Quantiz…

Author name: Xinhao Wang, Zhonyu Xia, Zhiwei Lin, Zhe Li, Yongtao Wang

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models