cs.CL, cs.LG

A Single-Layer Model Can Do Language Modeling

arXiv:2605.10643v1 Announce Type: new
Abstract: Modern language models scale depth by stacking layers, each holding its own state – a per-layer KV cache in transformers, a per-layer matrix in Mamba, Gated DeltaNet (GDN), RWKV, and xLSTM. Biological sy…