DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
arXiv:2605.10933v2 Announce Type: replace-cross
Abstract: While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenec…