cs.AI, cs.LG

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

arXiv:2604.06798v1 Announce Type: cross
Abstract: Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing bin…