Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning
arXiv:2605.08741v1 Announce Type: new
Abstract: Inference-time harnesses substantially improve large language models on complex reasoning tasks. However, the intrinsic capabilities of the underlying model remain unchanged by the addition of these exte…