Gunjun Lee, Jiwon Kim, Jaiyoung Park, Younjoo Lee, Jung Ho Ahn

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

Gunjun Lee, Jiwon Kim, Jaiyoung Park, Younjoo Lee, Jung Ho Ahn / April 17, 2026

arXiv:2510.08055v2 Announce Type: replace
Abstract: Large Language Model (LLM) inference in production must meet stringent service-level objectives for both time-to-first-token (TTFT) and time-between-token (TBT) while maximizing throughput under fixe…

Author name: Gunjun Lee, Jiwon Kim, Jaiyoung Park, Younjoo Lee, Jung Ho Ahn

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill