Mechanistic Anomaly Detection via Functional Attribution
arXiv:2604.18970v1 Announce Type: new
Abstract: We can often verify the correctness of neural network outputs using ground truth labels, but we cannot reliably determine whether the output was produced by normal or anomalous internal mechanisms. Mecha…