Diffusion-State Policy Optimization for Masked Diffusion Language Models
arXiv:2602.06462v3 Announce Type: replace
Abstract: Masked diffusion language models generate text through iterative masked-token filling, but terminal-only rewards on final completions provide coarse credit assignment for the intermediate filling dec…