Weight Patching: Toward Source-Level Mechanistic Localization in LLMs
arXiv:2604.13694v1 Announce Type: new
Abstract: Mechanistic interpretability seeks to localize model behavior to the internal components that causally realize it. Prior work has advanced activation-space localization and causal tracing, but modules th…