Linearizing Vision Transformer with Test-Time Training
arXiv:2605.02772v1 Announce Type: new
Abstract: While linear-complexity attention mechanisms offer a promising alternative to Softmax attention for overcoming the quadratic bottleneck, training such models from scratch remains prohibitively expensive….