cs.CL, cs.CR

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

arXiv:2604.14865v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly exposed to adaptive jailbreaking, particularly in high-stakes Chemical, Biological, Radiological, and Nuclear (CBRN) domains. Although streaming probes enabl…