cs.CL, cs.LG

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

arXiv:2604.07343v1 Announce Type: cross
Abstract: Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. …