cs.LG

Path-Constrained Mixture-of-Experts

arXiv:2603.18297v2 Announce Type: replace
Abstract: Sparse Mixture-of-Experts (MoE) architectures route each token through a subset of experts at each layer independently. We propose viewing MoE computation through the lens of \emph{expert paths} — t…