Terry Gou, Puneet Gupta

Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks

Terry Gou, Puneet Gupta / April 28, 2026

arXiv:2604.23172v1 Announce Type: new
Abstract: In this work, we developed and tested 3 techniques for vector quantization (VQ) based model weight compression. To mitigate codebook collapse and enable end-to-end training, we adopted cosine similarity-…

Author name: Terry Gou, Puneet Gupta

Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks