Alignment as Equilibrium Design
Much of the alignment literature starts with the question of what are “human values”, “ethical behavior”, or “morality”, and how we can get models to act in accordance with them. This is an important question, but we argue that it can obscure a more fu…