Mansour Zoubeirou a Mayaki

Generalization and Scaling Laws for Mixture-of-Experts Transformers

Mansour Zoubeirou a Mayaki / April 13, 2026

arXiv:2604.09175v1 Announce Type: new
Abstract: We develop a theory of generalization and scaling for Mixture-of-Experts (MoE) Transformers that cleanly separates \emph{active} per-input capacity from routing combinatorics. By conditioning on fixed ro…

Author name: Mansour Zoubeirou a Mayaki

Generalization and Scaling Laws for Mixture-of-Experts Transformers