cs.CV, cs.LG

Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

arXiv:2605.04494v1 Announce Type: cross
Abstract: Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimizati…