Ankit Jyothish, Ali Jannesari, Aishwarya Sarkar, Joseph Zuber

Fast MoE Inference via Predictive Prefetching and Expert Replication

Ankit Jyothish, Ali Jannesari, Aishwarya Sarkar, Joseph Zuber / May 13, 2026

arXiv:2605.11537v1 Announce Type: new
Abstract: The Mixture of Experts (MoE) architecture has become a fundamental building block in state-of-the-art large language models (LLMs), improving domain-specific expertise in LLMs and scaling model capacity …

Author name: Ankit Jyothish, Ali Jannesari, Aishwarya Sarkar, Joseph Zuber

Fast MoE Inference via Predictive Prefetching and Expert Replication