cs.AI

Data-driven Circuit Discovery for Interpretability of Language Models

arXiv:2605.09129v1 Announce Type: new
Abstract: Circuit discovery aims to explain how language models (LMs) implement a specific task by localizing and interpreting a circuit, a computational subgraph responsible for the LM’s behavior. Existing circui…