cs.LG

The Differences Between Direct Alignment Algorithms are a Blur

arXiv:2502.01237v3 Announce Type: replace
Abstract: Direct Alignment Algorithms (DAAs) simplify LLM alignment by directly optimizing policies, bypassing reward modeling and RL. While DAAs differ in their use of SFT (one-stage vs. two-stage) and the sc…