RewardBench 2: Advancing Reward Model Evaluation
arXiv:2506.01937v2 Announce Type: replace
Abstract: Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, rea…