| 🌱 Course: https://github.com/anakin87/llm-rl-environments-lil-course | I've been deep into RL for LLMs lately. Over the past year, we've seen a shift in LLM Post-Training. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can reach new heights without expensive data. But what actually are these environments in practice? And how do you build them effectively? Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. --- What you'll learn 🧩 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🎮 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master that beats GPT-5-mini
If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 HF collection with datasets and models: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe [link] [comments] |