Step-level Denoising-time Diffusion Alignment with Multiple Objectives
arXiv:2604.14379v1 Announce Type: new
Abstract: Reinforcement learning (RL) has emerged as a powerful tool for aligning diffusion models with human preferences, typically by optimizing a single reward function under a KL regularization constraint. In …