Zhiquan Tan, Yinrong Hong

PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners

Zhiquan Tan, Yinrong Hong / April 30, 2026

arXiv:2604.26573v1 Announce Type: new
Abstract: Improving large language model (LLM) reasoning requires supervision that is both aligned with the model’s own test-time states and informative at the token level. Reinforcement learning with verifiable r…

Author name: Zhiquan Tan, Yinrong Hong

PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners