AIPsy-Affect: A Keyword-Free Clinical Stimulus Battery for Mechanistic Interpretability of Emotion in Language Models
arXiv:2604.23719v1 Announce Type: new
Abstract: Mechanistic interpretability research on emotion in large language models — linear probing, activation patching, sparse autoencoder (SAE) feature analysis, causal ablation, steering vector extraction –…