Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks
arXiv:2604.23172v1 Announce Type: new
Abstract: In this work, we developed and tested 3 techniques for vector quantization (VQ) based model weight compression. To mitigate codebook collapse and enable end-to-end training, we adopted cosine similarity-…