Training a General Purpose Automated Red Teaming Model
arXiv:2604.23067v1 Announce Type: cross
Abstract: Automated methods for red teaming LLMs are an important tool to identify LLM vulnerabilities that may not be covered in static benchmarks, allowing for more thorough probing. They can also adapt to eac…