Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim / May 4, 2026

arXiv:2602.03216v2 Announce Type: replace
Abstract: The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured …

Author name: Dongwon Jo, Beomseok Kang, Jiwon Song, Jae-Joon Kim

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection