Beyond Isolated Tasks: A Framework for Evaluating Coding Agents on Sequential Software Evolution
arXiv:2604.03035v1 Announce Type: cross
Abstract: Existing datasets for coding agents evaluate performance on isolated, single pull request (PR) tasks in a stateless manner, failing to capture the reality of real-world software development where code …