cs.AI, cs.CL, cs.LG

Mixture of Heterogeneous Grouped Experts for Language Modeling

arXiv:2604.23108v1 Announce Type: new
Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) are pivotal in industrial applications for their ability to scale performance efficiently. However, standard MoEs enforce uniform expert siz…