Author name: Junu Kim, Xiao Liu, Zhenghao Lin, Lei Ji, Yeyun Gong, Edward Choi

LayerNorm Induces Recency Bias in Transformer Decoders

Junu Kim, Xiao Liu, Zhenghao Lin, Lei Ji, Yeyun Gong, Edward Choi / April 21, 2026

arXiv:2509.21042v4 Announce Type: replace
Abstract: Causal self-attention provides positional information to Transformer decoders. Prior work has shown that stacks of causal self-attention layers alone induce a positional bias in attention scores towa…

cs.CL, cs.LG

LayerNorm Induces Recency Bias in Transformer Decoders

Junu Kim, Xiao Liu, Zhenghao Lin, Lei Ji, Yeyun Gong, Edward Choi / April 14, 2026

arXiv:2509.21042v3 Announce Type: replace
Abstract: Causal self-attention provides positional information to Transformer decoders. Prior work has shown that stacks of causal self-attention layers alone induce a positional bias in attention scores towa…