cs.CR, cs.LG

Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

arXiv:2603.12681v2 Announce Type: replace-cross
Abstract: We show that safety alignment in modular LLMs can exhibit a compositional vulnerability: adapters that appear benign and plausibly functional in isolation can, when linearly composed, compromis…