S^3-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data
arXiv:2605.01248v1 Announce Type: new
Abstract: Reinforcement learning (RL) post-training has enabled newer capabilities in models, such as agentic tool-use for search. However, these models struggle primarily due to limitations with sparse outcome-ba…