cs.AI

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

arXiv:2604.01591v2 Announce Type: replace
Abstract: We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Relative Policy Optimization (GRPO). In each pair …