Is it only me or all the public LLM judges are just bad? [D]
I see a lot of people trying to justify the usage of the LLM judges, including Neurips. Well, tbh PAT was impressive tho. However all the available LLM, which i tested on the highest version of Claude, Gemni and GPT are all trashed if the paper is a h…