Satyam Kumar, Arpit Singh Gautam, Kailash Talreja, Saurabh Jha

StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving

Satyam Kumar, Arpit Singh Gautam, Kailash Talreja, Saurabh Jha / April 14, 2026

arXiv:2604.09562v1 Announce Type: cross
Abstract: Efficient LLM serving must balance throughput and latency across diverse, bursty workloads. We introduce StreamServe, a disaggregated prefill decode serving architecture that combines metric aware rout…

Author name: Satyam Kumar, Arpit Singh Gautam, Kailash Talreja, Saurabh Jha

StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving