Xinbo Wu, Huan Zhang, Abhishek Umrawal, Lav R. Varshney

A Theoretical Game of Attacks via Compositional Skills

Xinbo Wu, Huan Zhang, Abhishek Umrawal, Lav R. Varshney / May 5, 2026

arXiv:2605.01034v1 Announce Type: new
Abstract: As large language models grow increasingly capable, concerns about their safe deployment have intensified. While numerous alignment strategies aim to restrict harmful behavior, these defenses can still b…

Author name: Xinbo Wu, Huan Zhang, Abhishek Umrawal, Lav R. Varshney

A Theoretical Game of Attacks via Compositional Skills