Author name: Rachel Ma, Dylan Hadfield-Menell, Kristjan Greenewald

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Rachel Ma, Dylan Hadfield-Menell, Kristjan Greenewald / May 13, 2026

arXiv:2605.06785v2 Announce Type: replace
Abstract: Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of condition…

cs.AI, cs.LG

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Rachel Ma, Dylan Hadfield-Menell, Kristjan Greenewald / May 11, 2026

arXiv:2605.06785v1 Announce Type: cross
Abstract: Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional…