cs.AI, cs.CV

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models

arXiv:2604.02816v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have shown strong reasoning ability, but their high computational and memory costs hinder deployment in resource-constrained settings. While Post-Training Quantiz…