cs.CL, cs.LG, math.OC, stat.ML

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

arXiv:2605.00155v1 Announce Type: cross
Abstract: Reinforcement learning from human feedback (RLHF) has become a core post-training step for aligning large language models, yet the reward signal used in RLHF is only a learned proxy for true human util…