cs.AI

Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning

arXiv:2605.08061v1 Announce Type: new
Abstract: We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic sc…