Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints
arXiv:2604.15664v1 Announce Type: new
Abstract: The rise of autonomous AI agents suggests that dynamic benchmark environments with built-in feedback on scientifically grounded tasks are needed to evaluate the capabilities of these agents in research w…