Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning
arXiv:2604.18639v1 Announce Type: new
Abstract: Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains …