cs.DC, cs.LG, math.OC, stat.ML

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

arXiv:2605.13434v1 Announce Type: cross
Abstract: Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server u…