The Differences Between Direct Alignment Algorithms are a Blur
arXiv:2502.01237v3 Announce Type: replace
Abstract: Direct Alignment Algorithms (DAAs) simplify LLM alignment by directly optimizing policies, bypassing reward modeling and RL. While DAAs differ in their use of SFT (one-stage vs. two-stage) and the sc…