cs.LG

Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

arXiv:2605.04952v1 Announce Type: new
Abstract: Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular e…