cs.AI, cs.CL, cs.LG

On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization

arXiv:2509.23542v2 Announce Type: replace
Abstract: The LLM-as-a-judge paradigm is widely used in both evaluating free-text model responses and reward modeling for model alignment and fine-tuning. Recently, fine-tuning judges with judge-specific data …