cs.CL, cs.LG

Diffusion-State Policy Optimization for Masked Diffusion Language Models

arXiv:2602.06462v3 Announce Type: replace
Abstract: Masked diffusion language models generate text through iterative masked-token filling, but terminal-only rewards on final completions provide coarse credit assignment for the intermediate filling dec…