A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
arXiv:2605.14929v1 Announce Type: new
Abstract: Scaled Outer Product (SOP) is a post-training quantization methodology for large language model weights, designed to deliver near-lossless fidelity at 4.5–6 bits per weight on hardware with per-layer LU…