Mechanisms of Introspective Awareness
arXiv:2603.21396v3 Announce Type: replace
Abstract: Recent work has shown that LLMs can sometimes detect when steering vectors are injected into their residual stream and identify the injected concept — a phenomenon termed “introspective awareness.” …