Chengyi Nie, Nian Si, Zijie Zhou

A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints

Chengyi Nie, Nian Si, Zijie Zhou / May 7, 2026

arXiv:2605.04595v1 Announce Type: new
Abstract: The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and …

Author name: Chengyi Nie, Nian Si, Zijie Zhou

A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints