Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe, Scott Pakin, Dan O'Malley

Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning

Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe, Scott Pakin, Dan O'Malley / May 11, 2026

arXiv:2605.08061v1 Announce Type: new
Abstract: We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic sc…

Author name: Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe, Scott Pakin, Dan O'Malley

Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning