Xiuying Wei, Caglar Gulcehre

RAT+: Train Dense, Infer Sparse — Recurrence Augmented Attention for Dilated Inference

Xiuying Wei, Caglar Gulcehre / May 4, 2026

arXiv:2602.18196v3 Announce Type: replace
Abstract: Structured dilated attention has an appealing inference-time efficiency knob: it reduces the FLOPs of attention and the KV cache size by a factor of the dilation size D, while preserving long-range c…

Author name: Xiuying Wei, Caglar Gulcehre

RAT+: Train Dense, Infer Sparse — Recurrence Augmented Attention for Dilated Inference