cs.LG

TileQ: Efficient Low-Rank Quantization of Mixture-of-Experts with 2D Tiling

arXiv:2605.09281v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) models achieve remarkable performance by sparsely activating specialized experts, yet their massive parameters in experts pose significant challenges for deployment. While low-ra…