ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing
arXiv:2603.27914v2 Announce Type: replace
Abstract: We present ITQ3_S (Interleaved Ternary Quantization — Specialized), a novel 3-bit weight quantization format for LLMs integrating TurboQuant (TQ), a rotation-domain strategy based on the Fast Walsh-…