Provably Extracting the Features from a General Superposition
arXiv:2512.15987v2 Announce Type: replace
Abstract: It is widely believed that complex machine learning models generally encode features through linear representations. This is the foundational hypothesis behind a vast body of work on interpretability…