cs.DC, cs.LG

STAR: Decode-Phase Rescheduling for LLM Inference

arXiv:2510.13668v2 Announce Type: replace-cross
Abstract: Large Language Model (LLM) inference has emerged as a fundamental paradigm, however, variations in output length cause severe workload imbalance in the decode phase, particularly for long-outpu…

Scroll to Top