cs.CL

Scaling Probabilistic Transformer via Efficient Cross-Scale Hyperparameter Transfer

arXiv:2604.25409v1 Announce Type: new
Abstract: Probabilistic Transformer (PT), a white-box probabilistic model for contextual word representation, has demonstrated substantial similarity to standard Transformers in both computational structure and do…