Abien Fred Agarap, Arnulfo P. Azcarraga

Mixture of Experts with Soft Nearest Neighbor Loss: Resolving Expert Collapse via Representation Disentanglement

Abien Fred Agarap, Arnulfo P. Azcarraga / March 31, 2026

arXiv:2603.26734v1 Announce Type: cross
Abstract: The Mixture-of-Experts (MoE) model uses a set of expert networks that specialize on subsets of a dataset under the supervision of a gating network. A common issue in MoE architectures is “expert colla…

Author name: Abien Fred Agarap, Arnulfo P. Azcarraga

Mixture of Experts with Soft Nearest Neighbor Loss: Resolving Expert Collapse via Representation Disentanglement