Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
arXiv:2511.02230v4 Announce Type: replace-cross
Abstract: KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests’ KV cache if new requests are waiting. This policy brea…