Subhadip Mitra - Provide.ai

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

Subhadip Mitra / April 1, 2026

arXiv:2603.28769v1 Announce Type: cross
Abstract: Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets gr…

Author name: Subhadip Mitra

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation