cs.CV, cs.LG

ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

arXiv:2604.20816v1 Announce Type: new
Abstract: Reinforcement Learning (RL) post-training has become the standard for aligning generative models with human preferences, yet most methods rely on a single scalar reward. When multiple criteria matter, th…