cs.AI, cs.LG

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

arXiv:2603.19880v2 Announce Type: replace
Abstract: Test-Time Reinforcement Learning (TTRL) enables Large Language Models (LLMs) to enhance reasoning capabilities on unlabeled test streams by deriving pseudo-rewards from majority voting consensus. How…