How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
arXiv:2605.06605v1 Announce Type: new
Abstract: Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally expensive; key events — e.g., jailbreaks or successful tas…