cs.AI, cs.CL, cs.LG

Native Hybrid Attention for Efficient Sequence Modeling

arXiv:2510.07019v3 Announce Type: replace-cross
Abstract: Transformers excel at sequence modeling but face quadratic complexity, while linear attention offers improved efficiency but often compromises recall accuracy over long contexts. In this work, …