Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models
arXiv:2601.04448v3 Announce Type: replace
Abstract: Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad task generalization without additional fine-tuning. …