Asaf Avrahamy, Yoav Gur-Arieh, Mor Geva

Disentangling MLP Neuron Weights in Vocabulary Space

Asaf Avrahamy, Yoav Gur-Arieh, Mor Geva / April 8, 2026

arXiv:2604.06005v1 Announce Type: new
Abstract: Interpreting the information encoded in model weights remains a fundamental challenge in mechanistic interpretability. In this work, we introduce ROTATE (Rotation-Optimized Token Alignment in weighT spac…

Author name: Asaf Avrahamy, Yoav Gur-Arieh, Mor Geva

Disentangling MLP Neuron Weights in Vocabulary Space