The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime
arXiv:2604.12951v1 Announce Type: new
Abstract: The most cited calibration result in deep learning — post-temperature-scaling ECE of 0.012 on CIFAR-100 (Guo et al., 2017) — is below the statistical noise floor. We prove this is not a failure of the …