cs.AI, cs.LG

Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence

arXiv:2605.13079v1 Announce Type: new
Abstract: Muon orthogonalizes the momentum buffer before each update, replacing its singular values with ones via Newton-Schulz iterations. This simple change lets Muon tolerate far larger learning rates and conve…