Ryan Wang, Akshita Bhagia, Sewon Min

EMO: Pretraining Mixture of Experts for Emergent Modularity

Ryan Wang, Akshita Bhagia, Sewon Min / May 8, 2026

arXiv:2605.06663v1 Announce Type: new
Abstract: Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowled…

Author name: Ryan Wang, Akshita Bhagia, Sewon Min

EMO: Pretraining Mixture of Experts for Emergent Modularity