cs.CL, cs.LG

Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

arXiv:2507.07129v3 Announce Type: replace-cross
Abstract: We study a constrained training regime for decoder-only Transformers in which the token interface is fixed, previously trained dense blocks are not reopened, and the active trainable parameter …