cs.AI, cs.LG

Normalized Architectures are Natively 4-Bit

arXiv:2605.06067v1 Announce Type: new
Abstract: Training large language models at 4-bit precision is critical for efficiency. We show that nGPT, an architecture that constrains weights and hidden representations to the unit hypersphere, is inherently …