Chris Elliott, Einar Urdshals, David Quarel, Daniel Murfet

Interpreting Reinforcement Learning Agents with Susceptibilities

Chris Elliott, Einar Urdshals, David Quarel, Daniel Murfet / May 11, 2026

arXiv:2605.08007v1 Announce Type: new
Abstract: Susceptibilities are a technique for neural network interpretability that studies the response of posterior expectation values of observables to perturbations of the loss. We generalize this construction…

Author name: Chris Elliott, Einar Urdshals, David Quarel, Daniel Murfet

Interpreting Reinforcement Learning Agents with Susceptibilities