ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning
arXiv:2604.27644v2 Announce Type: replace-cross
Abstract: We propose a paradigm shift toward open-ended curriculum self-play: rather than learning to answer on a fixed prompt set, a unified policy learns to question: generating verifiable problems, so…