On Cost-Effective LLM-as-a-Judge Improvement Techniques
arXiv:2604.13717v2 Announce Type: replace
Abstract: Using a language model to score or rank candidate responses has become a scalable alternative to human evaluation in reinforcement learning from human feedback (RLHF) pipelines, benchmarking, and app…