Who Defines “Best”? Towards Interactive, User-Defined Evaluation of LLM Leaderboards
arXiv:2604.21769v1 Announce Type: new
Abstract: LLM leaderboards are widely used to compare models and guide deployment decisions. However, leaderboard rankings are shaped by evaluation priorities set by benchmark designers, rather than by the diverse…