Parcae: Scaling Laws For Stable Looped Language Models
arXiv:2604.12946v1 Announce Type: new
Abstract: Traditional fixed-depth architectures scale quality by increasing training FLOPs, typically through increased parameterization, at the expense of a higher memory footprint, or data. A potential alternati…