cs.LG

Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs

arXiv:2604.18788v1 Announce Type: new
Abstract: Apple Neural Engine (ANE) is a dedicated neural processing unit (NPU) present in every Apple Silicon chip. Mixture-of-Experts (MoE) LLMs improve inference efficiency via sparse activation but are challen…