Tracking Equivalent Mechanistic Interpretations Across Neural Networks
arXiv:2603.30002v1 Announce Type: cross
Abstract: Mechanistic interpretability (MI) is an emerging framework for interpreting neural networks. Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains…