cs.AI, cs.LG

Krause Synchronization Transformers

arXiv:2602.11534v3 Announce Type: replace
Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces …