EMO: Pretraining Mixture of Experts for Emergent Modularity
arXiv:2605.06663v1 Announce Type: new
Abstract: Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowled…