cs.AI, cs.LG

Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents

arXiv:2605.07138v1 Announce Type: new
Abstract: Reinforcement learning from verifiable emotion rewards RLVER has produced language models with strong empathetic performance, evaluated on benchmarks that assume cooperative, honest users. Yet real emoti…