Stability and Generalization in Looped Transformers
arXiv:2604.15259v1 Announce Type: new
Abstract: Looped transformers promise test-time compute scaling by spending more iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test ti…