cs.LG

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection

arXiv:2605.12879v1 Announce Type: new
Abstract: Doubly-stochastic attention has emerged as a transport-based alternative to row-softmax attention, with recent Transformer variants using it to reduce attention sinks and rank collapse while improving pe…