cs.AI, cs.CL, cs.LG

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

arXiv:2604.04988v1 Announce Type: cross
Abstract: Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as parameter count or FLOPs do not reliably predict wall-clo…