Tehila Dahan, Roie Reshef, Sharon Goldstein, Kfir Y. Levy

Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum

Tehila Dahan, Roie Reshef, Sharon Goldstein, Kfir Y. Levy / May 5, 2026

arXiv:2605.02043v1 Announce Type: new
Abstract: Asynchronous stochastic gradient descent (SGD) enables scalable distributed training but suffers from gradient staleness. Existing mitigation strategies, such as delay-adaptive learning rates and stalene…

Author name: Tehila Dahan, Roie Reshef, Sharon Goldstein, Kfir Y. Levy

Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum