Harsh Goel, Akhil Udathu, Susmija Jabireddy, Pradnesh Kalkar, Atharva Parulekar

S^3-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data

Harsh Goel, Akhil Udathu, Susmija Jabireddy, Pradnesh Kalkar, Atharva Parulekar / May 5, 2026

arXiv:2605.01248v1 Announce Type: new
Abstract: Reinforcement learning (RL) post-training has enabled newer capabilities in models, such as agentic tool-use for search. However, these models struggle primarily due to limitations with sparse outcome-ba…

Author name: Harsh Goel, Akhil Udathu, Susmija Jabireddy, Pradnesh Kalkar, Atharva Parulekar

S^3-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data