XPERT: Expert Knowledge Transfer for Effective Training of Language Models
arXiv:2605.08842v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) language models organize knowledge into explicitly routed expert modules, making expert-level representations traceable and analyzable. By analyzing expert activation patterns in…