Zhengyang Zhao, Lu Ma, Wentao Zhang

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

Zhengyang Zhao, Lu Ma, Wentao Zhang / May 12, 2026

arXiv:2605.08741v1 Announce Type: new
Abstract: Inference-time harnesses substantially improve large language models on complex reasoning tasks. However, the intrinsic capabilities of the underlying model remain unchanged by the addition of these exte…

Author name: Zhengyang Zhao, Lu Ma, Wentao Zhang

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning