Wilson E. Marc\'ilio-Jr, Danilo M. Eler

Navigating the Concept Space of Language Models

Wilson E. Marc\'ilio-Jr, Danilo M. Eler / March 26, 2026

arXiv:2603.23524v1 Announce Type: cross
Abstract: Sparse autoencoders (SAEs) trained on large language model activations output thousands of features that enable mapping to human-interpretable concepts. The current practice for analyzing these feature…

Author name: Wilson E. Marc\'ilio-Jr, Danilo M. Eler

Navigating the Concept Space of Language Models