Mohammad Rezaei, Jens Lehmann, Sahar Vahdati

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Mohammad Rezaei, Jens Lehmann, Sahar Vahdati / April 6, 2026

arXiv:2604.02341v1 Announce Type: new
Abstract: Mathematical reasoning in large language models has improved substantially with reinforcement learning using verifiable rewards, where final answers can be checked automatically and converted into reliab…

Author name: Mohammad Rezaei, Jens Lehmann, Sahar Vahdati

LLM Reasoning with Process Rewards for Outcome-Guided Steps