cs.LG, stat.ML

Online Distributionally Robust LLM Alignment via Regression to Relative Reward

arXiv:2509.19104v2 Announce Type: replace
Abstract: Reinforcement Learning with Human Feedback (RLHF) has become crucial for aligning Large Language Models (LLMs) with human intent. However, existing offline RLHF approaches suffer from overoptimizatio…