For years, MMLU was the benchmark. If a research paper or product announcement wanted to prove a model was smart, MMLU scores were front…
For years, MMLU was the benchmark. If a research paper or product announcement wanted to prove a model was smart, MMLU scores were front…