cs.CR, cs.LG

AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models

arXiv:2505.10846v3 Announce Type: replace
Abstract: This paper presents AutoRAN, the first framework to automate the hijacking of internal safety reasoning in large reasoning models (LRMs). At its core, AutoRAN pioneers an execution simulation paradig…