Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs
arXiv:2602.15091v2 Announce Type: replace
Abstract: Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating,…