LearnAlign: Data Selection for LLM Reinforcement Learning with Improved Gradient Alignment
arXiv:2506.11480v4 Announce Type: replace-cross
Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing LLMs’ reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this c…