When Can We Trust LLM Graders? Calibrating Confidence for Automated Assessment
arXiv:2603.29559v1 Announce Type: new
Abstract: Large Language Models (LLMs) show promise for automated grading, but their outputs can be unreliable. Rather than improving grading accuracy directly, we address a complementary problem: \textit{predicti…