Spectral Condition for $\mu$P under Width-Depth Scaling
arXiv:2603.00541v2 Announce Type: replace-cross
Abstract: Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model si…