Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
arXiv:2603.18893v2 Announce Type: replace
Abstract: Tracking the internal states of large language models across conversations is important for safety, interpretability, and model welfare, yet current methods are limited. Linear probes and other white…