Deepak Pandita, Flip Korn, Chris Welty, Christopher M. Homan

Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling

Deepak Pandita, Flip Korn, Chris Welty, Christopher M. Homan / May 14, 2026

arXiv:2605.13801v1 Announce Type: cross
Abstract: As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently …

Author name: Deepak Pandita, Flip Korn, Chris Welty, Christopher M. Homan

Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling