Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov / April 29, 2026

arXiv:2604.24964v1 Announce Type: cross
Abstract: Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site wor…

Author name: Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks