Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning
arXiv:2605.02073v1 Announce Type: new
Abstract: Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet pe…