cs.AI, cs.LG, math.PR, math.ST, stat.TH

Scaling Limits of Long-Context Transformers

arXiv:2605.08505v1 Announce Type: cross
Abstract: We study the long-context limit of softmax self-attention with a fixed query and a random context of $n$ i.i.d. keys on the sphere, viewing the inverse temperature $\beta_n$ as the scaling parameter th…