Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models
arXiv:2512.20677v4 Announce Type: replace-cross
Abstract: The increasing deployment of large language models (LLMs) in safety-critical applications raises fundamental challenges in systematically evaluating robustness against adversarial behaviors. Ex…