ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs
arXiv:2605.10793v1 Announce Type: new
Abstract: Large language models (LLMs) are costly to deploy due to their large memory footprint and high inference cost. Weight-activation quantization can reduce these costs, but low-bit activation quantization r…