Donald Ye - Provide.ai

Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers

Donald Ye / May 12, 2026

arXiv:2602.01442v3 Announce Type: replace-cross
Abstract: Gradient-based attribution is the workhorse of mechanistic interpretability, yet whether it reliably tracks causal importance at the component level remains largely untested. We causally evalua…

Author name: Donald Ye

Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers