Li Puyin, Jiyuan Tan, Ahmad Jabbar, Thomas Icard, Atticus Geiger

Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

Li Puyin, Jiyuan Tan, Ahmad Jabbar, Thomas Icard, Atticus Geiger / May 5, 2026

arXiv:2605.02234v1 Announce Type: cross
Abstract: We present a method for diagnosing interpretation in neural networks by identifying an input subspace where a proposed interpretation is highly faithful. Our method is particularly useful for causal-ab…

Author name: Li Puyin, Jiyuan Tan, Ahmad Jabbar, Thomas Icard, Atticus Geiger

Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction