Bowen Peng, Subho Ghosh, Jeffrey Quesnelle

Long Context Pre-Training with Lighthouse Attention

Bowen Peng, Subho Ghosh, Jeffrey Quesnelle / May 8, 2026

arXiv:2605.06554v1 Announce Type: new
Abstract: Training causal transformers at extreme sequence lengths is bottlenecked by the quadratic time and memory of scaled dot-product attention (SDPA). In this work, we propose Lighthouse Attention, a training…

Author name: Bowen Peng, Subho Ghosh, Jeffrey Quesnelle

Long Context Pre-Training with Lighthouse Attention