cs.CL, cs.LG

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

arXiv:2512.02010v4 Announce Type: replace
Abstract: As large language models have grown larger, interest has grown in low-precision numerical formats such as NVFP4 as a way to improve speed and reduce memory usage. However, quantizing models to NVFP4 …