Angela Tang - Provide.ai

Uncategorised

Alignment Faking Replication and Chain-of-Thought Monitoring Extensions

Angela Tang / April 26, 2026

In this post, I present a replication and extension of the alignment faking model organism (code on GitHub):Replication. I reproduced the alignment faking (AF) setup from Greenblatt et al. (2024) using the improved classifiers from …

Author name: Angela Tang

Alignment Faking Replication and Chain-of-Thought Monitoring Extensions