cs.CL, cs.LG

Training-Inference Consistent Segmented Execution for Long-Context LLMs

arXiv:2605.11744v1 Announce Type: new
Abstract: Transformer-based large language models face severe scalability challenges in long-context generation due to the computational and memory costs of full-context attention. Under practical computation and …