Average Attention Transformers and Arithmetic Circuits
arXiv:2605.04683v1 Announce Type: cross
Abstract: We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are give…