cs.AI, cs.HC, cs.LG

From Attribution to Action: A Human-Centered Application of Activation Steering

arXiv:2604.11467v1 Announce Type: cross
Abstract: Explainable AI (XAI) methods reveal which features influence model predictions, yet provide limited means for practitioners to act on these explanations. Activation steering of components identified vi…