| Hi it's Lewis from the Hugging Face post-training team! We spent the past month building RL environments in every major framework (verifiers, OpenEnv, Nemo-Gym, OpenRewards etc) and training models to better understand how they differ and scale across different axes. We're very excited to share another looong blog post on what we found, which frameworks work best under which conditions and how to scale RL envs reliably: https://huggingface.co/spaces/AdithyaSK/rl-environments-guide Hope yall will enjoy it, don't hesitate to make feedback on the community tab :) [link] [comments] |