cs.CL, cs.LG

Tracking Equivalent Mechanistic Interpretations Across Neural Networks

arXiv:2603.30002v1 Announce Type: cross
Abstract: Mechanistic interpretability (MI) is an emerging framework for interpreting neural networks. Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains…