cs.AI, cs.CL

Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

arXiv:2605.02234v1 Announce Type: cross
Abstract: We present a method for diagnosing interpretation in neural networks by identifying an input subspace where a proposed interpretation is highly faithful. Our method is particularly useful for causal-ab…