cs.LG

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

arXiv:2604.03922v1 Announce Type: new
Abstract: Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristic…