Yash Ingle, Jaival Chauhan, Ankit Yadav, Sudhakar Mishra

Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR

Yash Ingle, Jaival Chauhan, Ankit Yadav, Sudhakar Mishra / May 11, 2026

arXiv:2605.07137v1 Announce Type: new
Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a highly effective method for improving the reasoning abilities of Large Language Models (LLMs). Recent research shows that Negative Sampl…

Author name: Yash Ingle, Jaival Chauhan, Ankit Yadav, Sudhakar Mishra

Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR