cs.CL, cs.LG

Nectar: Neural Estimation of Cached-Token Attention via Regression

arXiv:2605.09778v1 Announce Type: cross
Abstract: Evaluating softmax attention over a fixed long context requires reading every cached key-value pair for each new query token. For a given context (a book, a manual, a legal corpus) the attention output…