CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
arXiv:2604.26378v1 Announce Type: new
Abstract: Post-training quantization (PTQ) has become an important technique for reducing the inference cost of Large Language Models (LLMs). While recent mixed-precision methods improve ultra-low bit quantization…