Jordan F. McCann - Provide.ai

Descriptive Collision in Sparse Autoencoder Auto-Interpretability: When One Explanation Describes Many Features

Jordan F. McCann / May 14, 2026

arXiv:2605.12874v1 Announce Type: new
Abstract: Sparse autoencoders (SAEs) are now standard tools for decomposing language model activations into interpretable features, and automated interpretability pipelines routinely assign each feature a short na…

Author name: Jordan F. McCann

Descriptive Collision in Sparse Autoencoder Auto-Interpretability: When One Explanation Describes Many Features