Mixture of Heterogeneous Grouped Experts for Language Modeling
arXiv:2604.23108v1 Announce Type: new
Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) are pivotal in industrial applications for their ability to scale performance efficiently. However, standard MoEs enforce uniform expert siz…