MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
arXiv:2508.02343v2 Announce Type: replace
Abstract: Quantization significantly accelerates inference in large language models (LLMs) by replacing original high-precision matrices with low-precision counterparts. Recent advances in weight-activation qu…