cs.AI, cs.LG

Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

arXiv:2505.12509v3 Announce Type: replace
Abstract: Post-hoc explanations provide transparency and are essential for guiding model optimization, such as prompt engineering and data sanitation. However, applying model-agnostic techniques to Large Langu…