SCAN: Structured Capability Assessment and Navigation for LLMs
arXiv:2505.06698v4 Announce Type: replace
Abstract: Evaluating Large Language Models (LLMs) has become increasingly important, with automatic evaluation benchmarks gaining prominence as alternatives to human evaluation. While existing research has foc…