Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
arXiv:2604.18473v1 Announce Type: new
Abstract: Extending a fully post-trained language model with new domain capabilities is fundamentally limited by monolithic training paradigms: retraining from scratch is expensive and scales poorly, while continu…