Stabilizing Efficient Reasoning with Step-Level Advantage Selection
arXiv:2604.24003v1 Announce Type: new
Abstract: Large language models (LLMs) achieve strong reasoning performance by allocating substantial computation at inference time, often generating long and verbose reasoning traces. While recent work on efficie…