cs.LG

Temporally Extended Mixture-of-Experts Models

arXiv:2604.20156v1 Announce Type: new
Abstract: Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows available GPU memory, this churn can render optimizations…