cs.AI, cs.DC, cs.ET, cs.LG, cs.PF

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading

arXiv:2410.21316v2 Announce Type: replace
Abstract: Transformers and large language models~(LLMs) have seen rapid adoption in all domains. Their sizes have exploded to hundreds of billions of parameters and keep increasing. Under these circumstances, …