SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing
arXiv:2512.10411v5 Announce Type: replace
Abstract: The quadratic complexity of self attention in Transformer based LLMs renders long context inference prohibitively expensive. While Sliding Window Attention (SWA), the simplest sparse attention patter…