DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO
arXiv:2604.11119v1 Announce Type: new
Abstract: This paper reorganizes the current manuscript around the DPO versus DDO-RM preference-optimization project and focuses on two parts: the algorithmic view and the preliminary held-out benchmark. The bench…