Hongliang Liu, Tung-Ling Li, Yuhao Wu

Perturbation Probing: A Two-Pass-per-Prompt Diagnostic for FFN Behavioral Circuits in Aligned LLMs

Hongliang Liu, Tung-Ling Li, Yuhao Wu / May 1, 2026

arXiv:2604.27401v1 Announce Type: cross
Abstract: Perturbation probing generates task-specific causal hypotheses for FFN neurons in large language models using two forward passes per prompt and no backpropagation, followed by a one-time intervention s…

Author name: Hongliang Liu, Tung-Ling Li, Yuhao Wu

Perturbation Probing: A Two-Pass-per-Prompt Diagnostic for FFN Behavioral Circuits in Aligned LLMs