Model Spec Midtraining: Improving How Alignment Training Generalizes
arXiv:2605.02087v1 Announce Type: new
Abstract: Some frontier AI developers aim to align language models to a Model Spec or Constitution that describes the intended model behavior. However, standard alignment fine-tuning — training on demonstrations …