cs.AI, cs.LG

HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark

arXiv:2604.13954v1 Announce Type: cross
Abstract: Existing agent-safety evaluation has focused mainly on externally induced risks. Yet agents may still enter unsafe trajectories under benign conditions. We study this complementary but underexplored se…