cs.AI, cs.LG

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

arXiv:2604.12782v1 Announce Type: cross
Abstract: While 4-bit quantization is essential for high-throughput deployment of Large Language Models, activation outliers often lead to significant accuracy degradation due to the restricted dynamic range of …