cs.DM, cs.ET, cs.LG, math.OC

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

arXiv:2605.06656v1 Announce Type: new
Abstract: Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from …