cs.AI, cs.CL, cs.LG

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

arXiv:2602.05000v2 Announce Type: replace-cross
Abstract: Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-training in continuous diffusion models. In this paper, we study reward guidance for dis…