Muneeb Ur Raheem Khan

Decoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generation

Muneeb Ur Raheem Khan / May 5, 2026

arXiv:2605.02348v1 Announce Type: cross
Abstract: Large language models pick up social biases from the data they are trained on and carry those biases into downstream applications, often reinforcing stereotypes around gender, race, religion, disabilit…

Author name: Muneeb Ur Raheem Khan

Decoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generation