cs.AI, cs.CL, cs.CV

POP: Prefill-Only Pruning for Efficient Large Model Inference

arXiv:2602.03295v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable capabilities. However, their deployment is hindered by significant computational costs. Existing stru…