Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML
arXiv:2605.06656v1 Announce Type: new
Abstract: Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from …