Lighthouse Attention: Rethinking Long-Context Transformer Training

Transformer scaling has created a new bottleneck in AI systems: attention computation at extreme sequence lengths.