Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling
arXiv:2605.11299v2 Announce Type: replace
Abstract: Code generation is typically trained in the primal space of programs: a model produces a candidate solution and receives sparse execution feedback, often a single pass/fail bit. Test-time scaling enr…