Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
arXiv:2604.10326v1 Announce Type: cross
Abstract: Large language models remain vulnerable to jailbreak attacks — inputs designed to bypass safety mechanisms and elicit harmful responses — despite advances in alignment and instruction tuning. We prop…