cs.LG

Scaling Self-Play with Self-Guidance

arXiv:2604.20209v1 Announce Type: new
Abstract: LLM self-play algorithms are notable in that, in principle, nothing bounds their learning: a Conjecturer model creates problems for a Solver, and both improve together. However, in practice, existing LLM…