Longsheng Zhou, Yu Shen

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

Longsheng Zhou, Yu Shen / April 8, 2026

arXiv:2604.04988v1 Announce Type: cross
Abstract: Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as parameter count or FLOPs do not reliably predict wall-clo…

Author name: Longsheng Zhou, Yu Shen

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression