Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures
arXiv:2604.16042v2 Announce Type: cross
Abstract: While Large Language Models (LLMs) have achieved strong performance across many NLP tasks, their opaque internal mechanisms hinder trustworthiness and safe deployment. Existing surveys in explainable A…