cs.AI, cs.DC, cs.LG

Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer

arXiv:2604.00785v1 Announce Type: cross
Abstract: Pretraining Large Language Models (LLMs) from scratch requires massive amount of compute. Aurora super computer is an ExaScale machine with 127,488 Intel PVC (Ponte Vechio) GPU tiles. In this work, we …