cs.AI, cs.CY, cs.HC

Who Defines “Best”? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

arXiv:2604.21769v1 Announce Type: new
Abstract: LLM leaderboards are widely used to compare models and guide deployment decisions. However, leaderboard rankings are shaped by evaluation priorities set by benchmark designers, rather than by the diverse…