cs.CV

Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models

arXiv:2605.06070v1 Announce Type: new
Abstract: Reinforcement learning from human feedback (RLHF) effectively promotes preference alignment of text-to-image (T2I) diffusion models. To improve computational efficiency, direct preference optimization (D…