Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang

Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang / April 13, 2026

arXiv:2505.12509v3 Announce Type: replace
Abstract: Post-hoc explanations provide transparency and are essential for guiding model optimization, such as prompt engineering and data sanitation. However, applying model-agnostic techniques to Large Langu…

Author name: Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang

Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models