Transformer-like Inference from Optimal Control
arXiv:2605.15608v1 Announce Type: new
Abstract: Decoder-only transformers compute the conditional probability of the next token from a sequence of past observations. This paper derives, from first principles, inference architectures that solve the sam…