AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
arXiv:2505.10846v3 Announce Type: replace
Abstract: This paper presents AutoRAN, the first framework to automate the hijacking of internal safety reasoning in large reasoning models (LRMs). At its core, AutoRAN pioneers an execution simulation paradig…