Uncategorised

Model Spec Midtraining: Improving How Alignment Training Generalizes

tl;dr We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussing their Model Spec, teaching them how they should behave and why. This controls how models generalize …