Author name: jenny

Uncategorised

A Toy Environment For Exploring Reasoning About Reward

jenny / March 25, 2026

tldr: We share a toy environment that we found useful for understanding how reasoning changed over the course of capabilities-focused RL. Over the course of capabilities-focused RL, the model biases more strongly towards reward hints over direct instru…

Uncategorised

Metagaming matters for training, evaluation, and oversight

jenny / March 18, 2026

Following up on our previous work on verbalized eval awareness:we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.Metagaming is a more general, and in our experience a more useful concept, than evaluati…