SeekerGym: A Benchmark for Reliable Information Seeking
arXiv:2604.17143v1 Announce Type: new
Abstract: Despite their substantial successes, AI agents continue to face fundamental challenges in terms of trustworthiness. Consider deep research agents, tasked with searching for information relevant to a give…