CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
arXiv:2508.12535v3 Announce Type: replace
Abstract: Sparse Autoencoders (SAEs) can extract interpretable features from large language models (LLMs) without supervision. However, their effectiveness in downstream steering tasks is limited by the requir…