ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction
arXiv:2506.01770v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) have achieved tremendous success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks of generating harmful c…