Jiaoda Li, Ryan Cotterell

Characterizing the Expressivity of Local Attention in Transformers

Jiaoda Li, Ryan Cotterell / May 4, 2026

arXiv:2605.00768v1 Announce Type: new
Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all pr…

Author name: Jiaoda Li, Ryan Cotterell

Characterizing the Expressivity of Local Attention in Transformers