cs.LG

SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization

arXiv:2605.12245v1 Announce Type: new
Abstract: NVFP4 has recently emerged as an efficient 4-bit microscaling format for large language models (LLMs), offering superior numerical fidelity with native hardware support. However, existing methods often y…