The Amazing Agent Race: Strong Tool Users, Weak Navigators
arXiv:2604.10261v2 Announce Type: replace
Abstract: Existing tool-use benchmarks for LLM agents are overwhelmingly linear: our analysis of six benchmarks shows 55 to 100% of instances are simple chains of 2 to 5 steps. We introduce The Amazing Agent R…