Nay Myat Min, Long H. Pham, Jun Sun

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

Nay Myat Min, Long H. Pham, Jun Sun / April 28, 2026

arXiv:2604.24542v1 Announce Type: cross
Abstract: Large language models deployed at runtime can misbehave in ways that clean-data validation cannot anticipate: training-time backdoors lie dormant until triggered, jailbreaks subvert safety alignment, a…

Author name: Nay Myat Min, Long H. Pham, Jun Sun

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models