cs.AI, cs.CL

SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing

arXiv:2512.10411v5 Announce Type: replace
Abstract: The quadratic complexity of self attention in Transformer based LLMs renders long context inference prohibitively expensive. While Sliding Window Attention (SWA), the simplest sparse attention patter…