cs.AI, cs.LG, math.OC

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models

arXiv:2604.15416v1 Announce Type: cross
Abstract: Sign-based optimization algorithms, such as SignSGD, have garnered significant attention for their remarkable performance in distributed learning and training large foundation models. Despite their emp…