cs.DC, cs.LG, cs.PF

SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

arXiv:2604.17627v1 Announce Type: new
Abstract: Serving large language models under latency service-level objectives (SLOs) is a configuration-heavy systems problem with an unusually failure-prone search space: many plausible configurations crash outr…