SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
arXiv:2512.14080v2 Announce Type: replace-cross
Abstract: Mixture of Experts (MoE) models have emerged as the de facto architecture for scaling up language models without significantly increasing the computational cost. Recent MoE models demonstrate a…