cs.CL

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

arXiv:2601.03699v2 Announce Type: replace
Abstract: As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from…