Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan

Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem

Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan / April 27, 2026

arXiv:2506.17299v2 Announce Type: replace-cross
Abstract: As large language models (LLMs) become increasingly deployed in safety-critical applications, the lack of systematic methods to assess their vulnerability to jailbreak attacks presents a critic…

Author name: Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan

Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem