cs.CL, cs.DC, cs.LG

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

arXiv:2603.28769v1 Announce Type: cross
Abstract: Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets gr…