cs.AI, cs.LG

Submodular Benchmark Selection

arXiv:2605.02209v1 Announce Type: new
Abstract: Evaluating large language models across many benchmarks is expensive, yet many benchmarks are highly correlated. We formalize the selection of a small, informative subset as submodular maximization under…