Improving Sparse Autoencoder with Dynamic Attention
arXiv:2604.14925v1 Announce Type: new
Abstract: Recently, sparse autoencoders (SAEs) have emerged as a promising technique for interpreting activations in foundation models by disentangling features into a sparse set of concepts. However, identifying …