cs.DC, cs.LG

FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

arXiv:2605.02125v1 Announce Type: cross
Abstract: Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynch…