Provably Learning Attention with Queries

arXiv:2601.16873v2 Announce Type: replace Abstract: We study the problem of learning Transformer-based sequence models with black-box access to their outputs. In this setting, a learner may adaptively query the oracle with any sequence of vectors and observe the output of the target function. We begin with studying the learnability of the simplest formulation, that is, learning a single-head attention-based regressor with queries. We show that for a model with width $d$, there is an elementary algorithm to learn the parameters of single-head attention with $O(d^2)$ queries. Further, we show that if there exists an algorithm to learn ReLU feedforward networks (FFNs), then the single-head algorithm can be easily adapted to learn one-layer Transformers with single-head attention. Next, we show that, in the common regime where the head dimension $r \ll d$, single-head attention-based models can be learned with $O(rd)$ queries via compressed sensing arguments. We also study robustness to noisy oracle access, proving that under mild norm and margin conditions, the parameters can be estimated to $\varepsilon$ accuracy with a polynomial number of queries even when outputs are only provided up to additive tolerance. Finally, we consider the learnability of multi-head attention and show that they are not identifiable from queries, and hence, learnability in the same sense is not feasible without additional assumptions. We discuss potential approaches to learn multi-head attention-based models under certain structural assumptions.

Leave a Comment