cs.AI, cs.CL

Navigating the Concept Space of Language Models

arXiv:2603.23524v1 Announce Type: cross
Abstract: Sparse autoencoders (SAEs) trained on large language model activations output thousands of features that enable mapping to human-interpretable concepts. The current practice for analyzing these feature…