cs.CL, cs.LG

MF-QAT: Multi-Format Quantization-Aware Training for Elastic Inference

arXiv:2604.00529v1 Announce Type: cross
Abstract: Quantization-aware training (QAT) is typically performed for a single target numeric format, while practical deployments often need to choose numerical precision at inference time based on hardware sup…