cs.AI, cs.HC, cs.LG, cs.SE

Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone

arXiv:2605.04454v1 Announce Type: cross
Abstract: Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise…