Oliver Bentham, Vivek Srikumar

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

Oliver Bentham, Vivek Srikumar / April 16, 2026

arXiv:2604.13201v1 Announce Type: cross
Abstract: Large language models are emerging as scientific assistants, but evaluating their ability to reason from empirical data remains challenging. Benchmarks derived from published studies and human annotati…

Author name: Oliver Bentham, Vivek Srikumar

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis