Mengya Hu, Qiong Wei, Sandeep Atluri

From Prompt Risk to Response Risk: Paired Analysis of Safety Behavior of Large Language Model

Mengya Hu, Qiong Wei, Sandeep Atluri / April 30, 2026

arXiv:2604.26052v1 Announce Type: new
Abstract: Safety evaluations of large language models (LLMs) typically report binary outcomes such as attack success rate, refusal rate, or harmful/not-harmful response classification. While useful, these can hide…

Author name: Mengya Hu, Qiong Wei, Sandeep Atluri

From Prompt Risk to Response Risk: Paired Analysis of Safety Behavior of Large Language Model