cond-mat.dis-nn, cs.AI, cs.LG, nlin.AO

Finite-Size Gradient Transport in Large Language Model Pretraining: From Cascade Size to Intensive Transport Efficiency

arXiv:2605.02968v1 Announce Type: cross
Abstract: We introduce a finite-size gradient-transport framework for real language-model training, based on five observables $(D,z,\beta,\delta,v_{\mathrm{rel}})$ that separate cascade size, duration, absolute …