cs.AI, cs.LG

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

arXiv:2604.19147v1 Announce Type: new
Abstract: Scaling Transformers typically necessitates training larger models from scratch, as standard architectures struggle to expand without discarding learned representations. We identify the primary bottlenec…