Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
arXiv:2605.14392v1 Announce Type: new
Abstract: We pursue a vision for self-improving language models in which the model does not merely generate problems or traces to imitate, but constructs the environments that train it. In zero-data reasoning RL, …