High dimensional theory of two-phase optimizers
arXiv:2603.26954v1 Announce Type: new
Abstract: The trend towards larger training setups has brought a renewed interest in partially asynchronous two-phase optimizers which optimize locally and then synchronize across workers. Additionally, recent wor…