cs.LG

Unstable Rankings in Bayesian Deep Learning Evaluation

arXiv:2604.23102v1 Announce Type: new
Abstract: Standard evaluations of Bayesian deep learning methods assume that metric estimates are reliable, but we show this assumption fails under data scarcity. Method rankings are not only unreliable at small $…