A Single-Layer Model Can Do Language Modeling
arXiv:2605.10643v1 Announce Type: new
Abstract: Modern language models scale depth by stacking layers, each holding its own state – a per-layer KV cache in transformers, a per-layer matrix in Mamba, Gated DeltaNet (GDN), RWKV, and xLSTM. Biological sy…