Emergent Inference-Time Semantic Contamination via In-Context Priming
arXiv:2604.04043v1 Announce Type: new
Abstract: Recent work has shown that fine-tuning large language models (LLMs) on insecure code or culturally loaded numeric codes can induce emergent misalignment, causing models to produce harmful content in unre…