cs.CL, cs.LG

BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design

arXiv:2604.03957v1 Announce Type: cross
Abstract: Ultra low-bit quantization brings substantial efficiency for Transformer-based models, but the accuracy degradation and limited GPU support hinder its wide usage. In this paper, we analyze zero-point d…