Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
arXiv:2504.19342v3 Announce Type: replace-cross
Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statis…