cs.AI, cs.CR, cs.LG

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

arXiv:2604.06367v1 Announce Type: cross
Abstract: Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or sa…