Uncategorised

The Case for Evaluating Model Behaviors

Most evaluations of AI systems focus on their capabilities: how good they are at coding tasks, how effectively they can answer complex scientific questions, and so on.From a safety perspective, capability evaluations have a place: by understanding how …