The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents
arXiv:2605.02244v1 Announce Type: cross
Abstract: Frontier software engineering agents have saturated short-horizon benchmarks while regressing on the work that constitutes senior engineering: long-horizon, multi-engineer, ambiguous-specification deli…