cs.AI, cs.SE

Beyond Isolated Tasks: A Framework for Evaluating Coding Agents on Sequential Software Evolution

arXiv:2604.03035v1 Announce Type: cross
Abstract: Existing datasets for coding agents evaluate performance on isolated, single pull request (PR) tasks in a stateless manner, failing to capture the reality of real-world software development where code …