Gradient-Free Noise Optimization for Reward Alignment in Generative Models
arXiv:2605.11347v2 Announce Type: replace-cross
Abstract: Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is…