Efficient Bayesian Inference from Noisy Pairwise Comparisons
arXiv:2510.09333v2 Announce Type: replace-cross
Abstract: Evaluating generative models is challenging because standard metrics often fail to reflect human preferences. Human evaluations are more reliable but costly and noisy, as participants vary in e…