Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang

Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning

Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang / April 23, 2026

arXiv:2512.15146v3 Announce Type: replace
Abstract: Test-time reinforcement learning mitigates the reliance on annotated data by using majority voting results as pseudo-labels, emerging as a complementary direction to reinforcement learning with verif…

Author name: Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang

Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning