cs.AI, cs.CL

Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation

arXiv:2604.03257v1 Announce Type: new
Abstract: The ability to rigorously estimate the failure rates of large language models (LLMs) is a prerequisite for their safe deployment. Currently, however, practitioners often face a tradeoff between expensive…