cs.AI

Agent psychometrics: Task-level performance prediction in agentic coding benchmarks

arXiv:2604.00594v1 Announce Type: new
Abstract: As the focus in LLM-based coding shifts from static single-step code generation to multi-step agentic interaction with tools and environments, understanding which tasks will challenge agents and why beco…