Transformer Approximations from ReLUs
arXiv:2604.24878v1 Announce Type: new
Abstract: We provide a systematic recipe for translating ReLU approximation results to softmax attention mechanism. This recipe covers many common approximation targets. Importantly, it yields target-specific, eco…