Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
arXiv:2605.04542v1 Announce Type: new
Abstract: Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including pow…