Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
arXiv:2605.11196v1 Announce Type: new
Abstract: Linear attention reduces the quadratic cost of softmax attention to $\mathcal{O}(T)$, but its memory state grows as $\mathcal{O}(T)$ in Frobenius norm, causing progressive interference between stored ass…