cs.LG

PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

arXiv:2604.12160v1 Announce Type: new
Abstract: Reasoning post-training with reinforcement learning from verifiable rewards (RLVR) is typically studied in centralized settings, yet many realistic applications involve decentralized private data distrib…