cs.AI, cs.CL

Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms

arXiv:2604.00012v1 Announce Type: new
Abstract: Despite the impressive performance of general-purpose large language models (LLMs), they often require fine-tuning or post-training to excel at specific tasks. For instance, large reasoning models (LRMs)…