cs.AI, math.OC, stat.ML

Muon Dynamics as a Spectral Wasserstein Flow

arXiv:2604.04891v1 Announce Type: cross
Abstract: Gradient normalization is central in deep-learning optimization because it stabilizes training and reduces sensitivity to scale. For deep architectures, parameters are naturally grouped into matrices o…