cs.AI, cs.DC, cs.LG

SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

arXiv:2605.01708v1 Announce Type: cross
Abstract: Contemporary systems serving large language models (LLMs) have adopted prefill-decode disaggregation to better load-balance between the compute-bound prefill phase and the memory-bound decode phase. Un…