math.IT

cs.AI, cs.IT, cs.LG, math.IT

Route Experts by Sequence, not by Token

arXiv:2511.06494v2 Announce Type: replace-cross
Abstract: Mixture-of-Experts (MoE) architectures scale large language models (LLMs) by activating only a subset of experts per token, but the standard TopK routing assigns the same fixed number of expert…

Scroll to Top