cs.LG, cs.SE

Investigating Test Overfitting on SWE-bench

arXiv:2511.16858v3 Announce Type: replace-cross
Abstract: Tests can be useful towards resolving issues on code repositories. However, relying too much on tests for issue resolution can lead to code that technically passes observed tests but actually m…