A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning
arXiv:2604.23114v1 Announce Type: new
Abstract: In limited-data settings, a single endpoint mean of an evaluation metric such as the Continuous Ranked Probability Score (CRPS) is itself a random variable, yet it is routinely reported as if it were a s…