Subhash Kantamneni - Provide.ai

Uncategorised

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Subhash Kantamneni / May 7, 2026

AbstractWe introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text descr…

Author name: Subhash Kantamneni

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations