cs.CL

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

arXiv:2602.11543v2 Announce Type: replace
Abstract: Pretraining large language models (LLMs) typically requires centralized clusters with thousands of high-memory GPUs (e.g., H100/A100). Recent decentralized training methods reduce communication overh…