cs.AI, cs.CR, cs.LG, cs.SE

ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction

arXiv:2506.01770v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) have achieved tremendous success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks of generating harmful c…