Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning
arXiv:2604.02007v2 Announce Type: replace
Abstract: Building general-purpose reasoning models using reinforcement learning with verifiable rewards (RLVR) across diverse domains has been widely adopted by frontier open-weight models. However, their tra…