Provably Learning Attention with Queries
arXiv:2601.16873v2 Announce Type: replace
Abstract: We study the problem of learning Transformer-based sequence models with black-box access to their outputs. In this setting, a learner may adaptively query the oracle with any sequence of vectors and …