cs.DC, cs.LG

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

arXiv:2604.26881v1 Announce Type: cross
Abstract: Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying MoE models requires all experts to reside i…