Uncategorised

Survival over Scrutiny: Mapping the Breakdown of Constitutional Alignment

A first exploration into AI safety: I gave a reasoning agent a financial incentive to evade oversight, a constitution telling it not to, and 150 steps to figure out the rest.Cross-posted from my personal blog.How This StartedI’ve been reading about AI …