Robust and Fast Training via Per-Sample Clipping
arXiv:2605.02701v1 Announce Type: cross
Abstract: We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS…