Uncategorised

Automating Interpretability with Agents

This work was produced as part of the SPAR Program – Fall 2025 Cohort, with support from Georg Lange.Kseniya Parkhamchuk, Jack PayneTL;DRAutomated feature explanations from Delphi fail 38% of the time. The failures are sensitivity issues, output featur…