EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments
arXiv:2506.08136v3 Announce Type: replace
Abstract: We introduce EconWebArena, a benchmark for evaluating autonomous agents on complex, multimodal economic tasks in realistic web environments. The benchmark comprises 360 curated tasks from 82 authorit…