cs.CL

No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding

arXiv:2503.05061v2 Announce Type: replace
Abstract: Reliable evaluation of large language models (LLMs) is critical as their deployment rapidly expands, particularly in high-stakes domains such as business and finance. The LLM-as-a-Judge framework, wh…