Autorubric: Unifying Rubric-based LLM Evaluation
arXiv:2603.00077v2 Announce Type: replace
Abstract: Techniques for reliable rubric-based LLM evaluation — ensemble judging, bias mitigation, few-shot calibration — are scattered across papers with inconsistent terminology and partial implementations…