cs.AI, cs.SE

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

arXiv:2603.26137v1 Announce Type: cross
Abstract: Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes…