Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg

Normalized Architectures are Natively 4-Bit

Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg / May 8, 2026

arXiv:2605.06067v1 Announce Type: new
Abstract: Training large language models at 4-bit precision is critical for efficiency. We show that nGPT, an architecture that constrains weights and hidden representations to the unit hypersphere, is inherently …

Author name: Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg

Normalized Architectures are Natively 4-Bit