A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning
arXiv:2510.18814v2 Announce Type: replace-cross
Abstract: Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Train…