Daniel Hsu - Provide.ai

Lower bounds for one-layer transformers that compute parity

Daniel Hsu / May 13, 2026

arXiv:2605.12171v1 Announce Type: new
Abstract: This note shows that no self-attention layer post-processed by a rational function can sign-represent the parity function unless the product of the number of heads and the degree of the post-processing f…

Author name: Daniel Hsu

Lower bounds for one-layer transformers that compute parity