MIDUS: Memory-Infused Depth Up-Scaling
arXiv:2512.13751v2 Announce Type: replace
Abstract: Expanding pre-trained language models offers a practical way to increase capacity without training larger models from scratch. Depth Up-Scaling (DUS) does so by duplicating Transformer blocks and ins…