cs.AI, cs.LG

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

arXiv:2510.13999v3 Announce Type: replace-cross
Abstract: Sparsely-activated Mixture-of-Experts (SMoE) models offer efficient pre-training and low latency but their large parameter counts create significant memory overhead, motivating research into ex…