Domain-Filtered Knowledge Graphs from Sparse Autoencoder Features
arXiv:2604.23829v2 Announce Type: new
Abstract: Sparse autoencoders (SAEs) extract millions of interpretable features from a language model, but flat feature inventories aren’t very useful on their own. Domain concepts get mixed with generic and weakl…