MachineLearning

LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]

I built a small website called LLM Win: https://llm-win.com It turns LLM benchmark results into a directed graph: text If model A beats model B on benchmark X, add an edge A -> B. Then it searches for the shortest transitive chain between two model…