GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving
arXiv:2605.00831v1 Announce Type: cross
Abstract: The rise of million-token, agent-based applications has placed unprecedented demands on large language model (LLM) inference services. The long-running nature of these tasks increases their susceptibil…