Chu-Cheng Lin, Eugene Ie

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

Chu-Cheng Lin, Eugene Ie / April 29, 2026

arXiv:2604.25907v1 Announce Type: new
Abstract: Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability $p_0…

Author name: Chu-Cheng Lin, Eugene Ie

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum