cs.AI, cs.CL, cs.LG

Beyond the Singular: Revealing the Value of Multiple Generations in Benchmark Evaluation

arXiv:2502.08943v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) have demonstrated significant utility in real-world applications, exhibiting impressive capabilities in natural language processing and understanding. Benchmark eva…