SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models
arXiv:2604.12617v2 Announce Type: replace-cross
Abstract: The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap sepa…