cs.CL, cs.DC, cs.OS

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

arXiv:2604.05091v1 Announce Type: new
Abstract: We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores par…