Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty
arXiv:2604.10072v2 Announce Type: replace
Abstract: Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, exis…