SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
arXiv:2604.07791v2 Announce Type: replace-cross
Abstract: Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have demonstrated significant potential in single-turn reasoning tasks. With the paradigm shift toward self-evolving age…