Subcritical Signal Propagation at Initialization in Normalization-Free Transformers
arXiv:2604.11890v1 Announce Type: cross
Abstract: We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transforme…