Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
arXiv:2510.13918v2 Announce Type: replace
Abstract: Process reward models (PRMs) are a cornerstone of test-time scaling (TTS), designed to verify and select the best responses from large language models (LLMs). However, this promise is challenged by r…