Adaptive Memory Decay for Log-Linear Attention
arXiv:2605.06946v1 Announce Type: cross
Abstract: Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-sp…