cs.CL, cs.LG

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

arXiv:2602.03216v2 Announce Type: replace
Abstract: The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured …