cs.AI

Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models

arXiv:2511.08484v2 Announce Type: replace
Abstract: We propose patching for large language models (LLMs) like software versions, a lightweight and modular approach for addressing safety vulnerabilities. While vendors release improved LLM versions, maj…