Or Shafran, Atticus Geiger, Mor Geva

Constructing Interpretable Features from Compositional Neuron Groups

Or Shafran, Atticus Geiger, Mor Geva / May 5, 2026

arXiv:2506.10920v2 Announce Type: replace-cross
Abstract: A central goal for mechanistic interpretability has been to identify the right units of analysis in large language models (LLMs) that causally explain their outputs. While early work focused on…

Author name: Or Shafran, Atticus Geiger, Mor Geva

Constructing Interpretable Features from Compositional Neuron Groups