What If LLMs Learn Better from Language Than from Rewards?
Rethinking LLM Optimization through TextGrad, MIPRO, andĀ GEPAFor the past couple of years, reinforcement learning (RL) has been the dominant paradigm for adapting large language models (LLMs) to downstream tasks. Methods like PPO and GRPO treat model b…