A Judge Agent Closes the Reliability Gap in AI-Generated Scientific Simulation
arXiv:2603.25780v1 Announce Type: cross
Abstract: Large language models can generate scientific simulation code, but the generated code silently fails on most non-textbook problems. We show that classical mathematical validation — well-posedness, con…