DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training
arXiv:2605.01989v1 Announce Type: new
Abstract: Distributed machine learning (ML) training has become a necessity with the prevalence of billion to trillion-parameter-scale models. While prior work has improved training efficiency from the ML perspect…