MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
arXiv:2605.07363v1 Announce Type: cross
Abstract: DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer that scores every prefix token and selects the mos…