A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models
arXiv:2604.16809v1 Announce Type: new
Abstract: Delayed loss spikes have been reported in neural-network training, but existing theory mainly explains earlier non-monotone behavior caused by overly large fixed learning rates. We study one stylized hyp…