Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
arXiv:2505.19770v4 Announce Type: replace-cross
Abstract: We present a fine-grained theoretical analysis of the performance gap between reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) under a representation g…