Clustering in pure-attention hardmax transformers and its role in sentiment analysis
arXiv:2407.01602v2 Announce Type: replace-cross
Abstract: Transformers are extremely successful machine learning models whose mathematical properties remain poorly understood. Here, we rigorously characterize the behavior of transformers with hardmax …