Gleb Molodtsov, Alexander Miasnikov, Aleksandr Beznosikov

Hierarchical Mixture-of-Experts with Two-Stage Optimization

Gleb Molodtsov, Alexander Miasnikov, Aleksandr Beznosikov / May 12, 2026

arXiv:2605.08292v1 Announce Type: cross
Abstract: Sparse Mixture-of-Experts (MoE) models scale capacity by routing each token to a small subset of experts. However, their routers exhibit a fundamental trade-off: strong load balancing can suppress expe…

Author name: Gleb Molodtsov, Alexander Miasnikov, Aleksandr Beznosikov

Hierarchical Mixture-of-Experts with Two-Stage Optimization