Yangyang Zhou, Yi-Chen Li

RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences

Yangyang Zhou, Yi-Chen Li / May 5, 2026

arXiv:2605.01831v1 Announce Type: cross
Abstract: Reinforcement Learning from Human Feedback has become the standard paradigm for language model alignment, where reward models directly determine alignment effectiveness. In this work, we focus on how t…

Author name: Yangyang Zhou, Yi-Chen Li

RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences