cs.AI, cs.LG, math.OC, stat.ML

Feature Starvation as Geometric Instability in Sparse Autoencoders

arXiv:2605.05341v1 Announce Type: cross
Abstract: Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$…