Peer-Predictive Self-Training for Language Model Reasoning
arXiv:2604.13356v1 Announce Type: cross
Abstract: Mechanisms for continued self-improvement of language models without external supervision remain an open challenge. We propose Peer-Predictive Self-Training (PST), a label-free fine-tuning framework in…