RAT+: Train Dense, Infer Sparse — Recurrence Augmented Attention for Dilated Inference
arXiv:2602.18196v3 Announce Type: replace
Abstract: Structured dilated attention has an appealing inference-time efficiency knob: it reduces the FLOPs of attention and the KV cache size by a factor of the dilation size D, while preserving long-range c…