Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone
arXiv:2605.04454v1 Announce Type: cross
Abstract: Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise…