cs.CL, cs.LG

When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer

arXiv:2604.23434v1 Announce Type: cross
Abstract: Dynamic Tanh (DyT) removes LayerNorm by bounding activations with a learned tanh(alpha x). We show that this bounding is a regime-dependent implicit regularizer, not a uniformly beneficial replacement….