Yipin Guo, Siddharth Joshi

SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

Yipin Guo, Siddharth Joshi / May 5, 2026

arXiv:2605.01708v1 Announce Type: cross
Abstract: Contemporary systems serving large language models (LLMs) have adopted prefill-decode disaggregation to better load-balance between the compute-bound prefill phase and the memory-bound decode phase. Un…

Author name: Yipin Guo, Siddharth Joshi

SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving