Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models
arXiv:2505.12509v3 Announce Type: replace
Abstract: Post-hoc explanations provide transparency and are essential for guiding model optimization, such as prompt engineering and data sanitation. However, applying model-agnostic techniques to Large Langu…