cs.CL

Does a Global Perspective Help Prune Sparse MoEs Elegantly?

arXiv:2604.06542v1 Announce Type: new
Abstract: Empirical scaling laws for language models have encouraged the development of ever-larger LLMs, despite their growing computational and memory costs. Sparse Mixture-of-Experts (MoEs) offer a promising al…