AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models
arXiv:2604.22871v1 Announce Type: cross
Abstract: Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the …