Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
arXiv:2604.09748v1 Announce Type: cross
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an emerging paradigm that significantly boosts a Large Language Model’s (LLM’s) reasoning abilities on complex logical tasks, such as mathematic…