cs.AI, cs.CL

Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search

arXiv:2604.18235v1 Announce Type: new
Abstract: Deep search agents can autonomously initiate multi-turn interactions with search engines, thereby exhibiting strong question-answering capabilities. Such performance critically relies on Group Relative P…