Uncategorised

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

Authors: Keshav Shenoy, Li Yang, Abhay Sheshadri, Soren Mindermann, Jack Lindsey, Sam Marks, and Rowan Wangđź“„Paper, 💻 Code, 🤖ModelsTL;DR: We introduce introspection adapters (IA), a technique for training an LLM to self-report behaviors it learned durin…