WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
arXiv:2603.25226v1 Announce Type: cross
Abstract: The emergence of Large Language Models (LLMs) has catalyzed a paradigm shift in programming, giving rise to “vibe coding”, where users can build complete projects and even control computers using natur…