Uncategorised

Toward a Better Evaluations Ecosystem

Model evaluations are broken. Numbers that are often cited alongside one another as evidence of progress are rarely comparable due to inconsistent methodologies, and AI companies run and report internal evals that are unavailable to the wider community…