cs.LG

Best Policy Learning from Trajectory Preference Feedback

arXiv:2501.18873v4 Announce Type: replace
Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful approach for aligning generative models, but its reliance on learned reward models makes it vulnerable to mis-specification…