cs.LG

AIMER: Calibration-Free Task-Agnostic MoE Pruning

arXiv:2603.18492v2 Announce Type: replace
Abstract: Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for…