cs.AI, cs.LG, stat.ML

LLMs Judging LLMs: A Simplex Perspective

arXiv:2505.21972v3 Announce Type: replace-cross
Abstract: Given the challenge of automatically evaluating free-form outputs from large language models (LLMs), an increasingly common solution is to use LLMs themselves as the judging mechanism, without …