cs.AI, cs.CL

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

arXiv:2505.14226v5 Announce Type: replace-cross
Abstract: Safety-aligned LLMs remain vulnerable to digital phenomena like textese that introduce non-canonical perturbations to words but preserve the phonetics. We introduce CMP-RT (code-mixed phonetic …