SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
arXiv:2605.05546v1 Announce Type: new
Abstract: Self-play reinforcement learning has shown strong performance in domains with formally verifiable structure, such as mathematics and coding, where both problem generation and reward computation can be gr…