Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training
arXiv:2509.11983v2 Announce Type: replace
Abstract: Neural network (NN) training is inherently a large-scale matrix optimization problem, yet the matrix structure of NN parameters has long been overlooked. Recently, the optimizer Muon \citep{jordanmuo…