Thomas Read - Provide.ai

Uncategorised

Reproducing steering against evaluation awareness in a large open-weight model

Thomas Read / April 10, 2026

Produced as part of the UK AISI Model Transparency Team. Our team works on ensuring models don’t subvert safety assessments, e.g. through evaluation awareness, sandbagging, or opaque reasoning.TL;DR We replicate Anthropic’s approach to using steering v…

Author name: Thomas Read

Reproducing steering against evaluation awareness in a large open-weight model