cs.AI, cs.CL, cs.LG

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

arXiv:2510.17196v3 Announce Type: replace-cross
Abstract: Effectively processing long contexts is a critical challenge for language models. While standard Transformers are limited by quadratic complexity and poor length extrapolation, alternative arch…