cs.LG

The Rate-Distortion-Polysemanticity Tradeoff in SAEs

arXiv:2605.14694v1 Announce Type: new
Abstract: Sparse Autoencoders (SAEs) that can accurately reconstruct their input (minimizing distortion) by making efficient use of few features (minimizing the rate) often fail to learn monosemantic representatio…