Sparse Layers are Critical to Scaling Looped Language Models
arXiv:2605.09165v1 Announce Type: new
Abstract: Looped language models repeat a set of transformer layers through depth, reducing memory costs and providing natural early-exit points at loop boundaries. However, looped models do not scale as favorably…