cs.AI, cs.LG

A Systematic Investigation of The RL-Jailbreaker in LLMs

arXiv:2605.07032v1 Announce Type: cross
Abstract: The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of mo…