cs.CL

RExBench: Can coding agents autonomously implement AI research extensions?

arXiv:2506.22598v3 Announce Type: replace
Abstract: Agents based on Large Language Models (LLMs) have shown promise for performing sophisticated software engineering tasks autonomously. In addition, there has been progress towards developing agents th…