Unsupervised Process Reward Models
arXiv:2605.10158v1 Announce Type: new
Abstract: Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cos…