Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs
arXiv:2605.08894v1 Announce Type: cross
Abstract: Large language models (LLMs) achieve strong performance but incur high deployment costs, motivating extremely low-bit but lossy quantization. Existing quantization algorithms mainly focus on improving …