Rajesh Ganguli, Raha Moraffah

Why Do Large Language Models Generate Harmful Content?

Rajesh Ganguli, Raha Moraffah / April 14, 2026

arXiv:2604.11663v1 Announce Type: new
Abstract: Large Language Models (LLMs) have been shown to generate harmful content. However, the underlying causes of such behavior remain under explored. We propose a causal mediation analysis-based approach to i…

Author name: Rajesh Ganguli, Raha Moraffah

Why Do Large Language Models Generate Harmful Content?