OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling
arXiv:2605.07815v1 Announce Type: cross
Abstract: Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer’s update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ra…