cs.LG, math.OC

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training

arXiv:2509.11983v2 Announce Type: replace
Abstract: Neural network (NN) training is inherently a large-scale matrix optimization problem, yet the matrix structure of NN parameters has long been overlooked. Recently, the optimizer Muon \citep{jordanmuo…