Teacher-Guided Routing for Sparse Vision Mixture-of-Experts
arXiv:2604.21330v1 Announce Type: new
Abstract: Recent progress in deep learning has been driven by increasingly large-scale models, but the resulting computational cost has become a critical bottleneck. Sparse Mixture of Experts (MoE) offers an effec…