cs.LG

Variational Neurons in Transformers for Language Modeling

arXiv:2603.28219v1 Announce Type: new
Abstract: Transformers for language modeling usually rely on deterministic internal computation, with uncertainty expressed mainly at the output layer. We introduce variational neurons into Transformer feed-forwar…