cs.AI, cs.LG

Adaptive Memory Decay for Log-Linear Attention

arXiv:2605.06946v1 Announce Type: cross
Abstract: Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-sp…