cs.AI, cs.CL

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

arXiv:2604.13201v1 Announce Type: cross
Abstract: Large language models are emerging as scientific assistants, but evaluating their ability to reason from empirical data remains challenging. Benchmarks derived from published studies and human annotati…