Nan Lu, Ethan Lee, Ethan X. Fang, Junwei Lu

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback

Nan Lu, Ethan Lee, Ethan X. Fang, Junwei Lu / May 1, 2026

arXiv:2504.19342v3 Announce Type: replace-cross
Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statis…

Author name: Nan Lu, Ethan Lee, Ethan X. Fang, Junwei Lu

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback