Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
arXiv:2604.09967v1 Announce Type: new
Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, its prac…