Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li

The Effect of Attention Head Count on Transformer Approximation

Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li / April 1, 2026

arXiv:2510.06662v2 Announce Type: replace
Abstract: Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we stud…

Author name: Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li

The Effect of Attention Head Count on Transformer Approximation