Muhammed Emrullah Ildiz, Halil Alperen Gozeten, Ege Onur Taga, Samet Oymak

Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought

Muhammed Emrullah Ildiz, Halil Alperen Gozeten, Ege Onur Taga, Samet Oymak / April 21, 2026

arXiv:2604.17912v1 Announce Type: new
Abstract: State-of-the-art reasoning models utilize long chain-of-thought (CoT) to solve increasingly complex problems using more test-time computation. In this work, we explore a long CoT setting where the model …

Author name: Muhammed Emrullah Ildiz, Halil Alperen Gozeten, Ege Onur Taga, Samet Oymak

Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought