Mokshit Surana, Archit Rathod, Akshaj Satishkumar

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Mokshit Surana, Archit Rathod, Akshaj Satishkumar / May 15, 2026

arXiv:2605.14087v1 Announce Type: cross
Abstract: Large Language Models (LLMs), when trained on web-scale corpora, inherently absorb toxic patterns from their training data. This leads to “toxic degeneration” where even innocuous prompts can trigger…

Author name: Mokshit Surana, Archit Rathod, Akshaj Satishkumar

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study