Federation of Experts: Communication Efficient Distributed Inference for Large Language Models
arXiv:2605.06206v1 Announce Type: new
Abstract: Mixture of experts has emerged as the primary mechanism for making Large Language Models (LLMs) computationally efficient. However, in distributed settings, communicating token embeddings between experts…