cs.DC, cs.LG, cs.PF

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

arXiv:2604.27089v1 Announce Type: new
Abstract: Large-language-models (LLMs) demonstrate enormous utility in long-context tasks which require processing prompts that consist of tens to hundreds of thousands of tokens. However, existing LLM training li…