POP: Prefill-Only Pruning for Efficient Large Model Inference
arXiv:2602.03295v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable capabilities. However, their deployment is hindered by significant computational costs. Existing stru…