Cross-Lingual Jailbreak Detection via Semantic Codebooks
arXiv:2604.25716v1 Announce Type: new
Abstract: Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual deployment. Prior work shows that translating malicious prompt…