Author name: Indraneil Paul, Glava\v{s} Glavas, Iryna Gurevych

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Indraneil Paul, Glava\v{s} Glavas, Iryna Gurevych / May 5, 2026

arXiv:2605.00754v2 Announce Type: replace-cross
Abstract: Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs i…

cs.LG, cs.SE

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Indraneil Paul, Glava\v{s} Glavas, Iryna Gurevych / May 4, 2026

arXiv:2605.00754v1 Announce Type: cross
Abstract: Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code g…