Pedro Conde, Henrique Branquinho, Valerio Mazzone, Bruno Mendes, Andr\'e Baptista, Nuno Moniz

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Pedro Conde, Henrique Branquinho, Valerio Mazzone, Bruno Mendes, Andr\'e Baptista, Nuno Moniz / May 12, 2026

arXiv:2605.10834v1 Announce Type: new
Abstract: AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will perform best in real-world targets. Existing evaluation p…

Author name: Pedro Conde, Henrique Branquinho, Valerio Mazzone, Bruno Mendes, Andr\'e Baptista, Nuno Moniz

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World