cs.CL

DySCO: Dynamic Attention-Scaling Decoding for Long-Context Language Models

arXiv:2602.22175v2 Announce Type: replace
Abstract: Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models support increasingly long context windows, their accuracy often deteriorates a…