cs.AI, cs.LG

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

arXiv:2512.14080v2 Announce Type: replace-cross
Abstract: Mixture of Experts (MoE) models have emerged as the de facto architecture for scaling up language models without significantly increasing the computational cost. Recent MoE models demonstrate a…