Wenbo Zhang, Hengrui Cai, Wenyu Chen

Beyond the Singular: Revealing the Value of Multiple Generations in Benchmark Evaluation

Wenbo Zhang, Hengrui Cai, Wenyu Chen / May 12, 2026

arXiv:2502.08943v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) have demonstrated significant utility in real-world applications, exhibiting impressive capabilities in natural language processing and understanding. Benchmark eva…

Author name: Wenbo Zhang, Hengrui Cai, Wenyu Chen

Beyond the Singular: Revealing the Value of Multiple Generations in Benchmark Evaluation