cs.AI

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

arXiv:2605.14438v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, lead…