Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
arXiv:2605.10365v1 Announce Type: new
Abstract: Autonomous agents have rapidly matured as task executors and seen widespread deployment via harnesses such as OpenClaw. Safety concerns have rightly drawn growing research attention, and beneath them lie…