Sahar Admoni, Ofra Amir, Assaf Hallak, Yftah Ziser

Aligning What LLMs Do and Say: Towards Self-Consistent Explanations

Sahar Admoni, Ofra Amir, Assaf Hallak, Yftah Ziser / April 14, 2026

arXiv:2506.07523v3 Announce Type: replace
Abstract: Large language models (LLMs) seem to offer an easy path to interpretability: just ask them to explain their answers. Yet the features driving an answer often differ from those emphasized in its expla…

Author name: Sahar Admoni, Ofra Amir, Assaf Hallak, Yftah Ziser

Aligning What LLMs Do and Say: Towards Self-Consistent Explanations