PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
arXiv:2604.26573v1 Announce Type: new
Abstract: Improving large language model (LLM) reasoning requires supervision that is both aligned with the model’s own test-time states and informative at the token level. Reinforcement learning with verifiable r…