cs.AI, cs.CV

WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

arXiv:2604.10988v1 Announce Type: new
Abstract: Existing browser agent benchmarks face a fundamental trilemma: real-website benchmarks lack reproducibility due to content drift, controlled environments sacrifice realism by omitting real-web noise, and…