Hailing Cheng, Tao Huang, Chen Zhu, Antonio Alonso

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Hailing Cheng, Tao Huang, Chen Zhu, Antonio Alonso / April 28, 2026

arXiv:2604.24708v1 Announce Type: cross
Abstract: Training large neural networks with data-parallel stochastic gradient descent allocates N GPU replicas to compute effectively identical updates — a practice that leaves the rich space of learning rate…

Author name: Hailing Cheng, Tao Huang, Chen Zhu, Antonio Alonso

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models