cs.LG

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

arXiv:2603.06003v2 Announce Type: replace
Abstract: Sparse Mixture-of-Experts (SMoE) language models achieve strong capability at low per-token compute, yet deployment remains constrained by memory footprint and throughput because the full expert pool…