cs.CL

Characterizing the Expressivity of Local Attention in Transformers

arXiv:2605.00768v1 Announce Type: new
Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all pr…