cs.AI

Internalizing Safety Understanding in Large Reasoning Models via Verification

arXiv:2605.08930v1 Announce Type: new
Abstract: While explicit Chain-of-Thought (CoT) empowers large reasoning models (LRMs), it enables the generation of riskier final answers. Current alignment paradigms primarily rely on externally enforced complia…