EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery
arXiv:2601.01400v2 Announce Type: replace
Abstract: Current evaluations of mathematical reasoning in large language models (LLMs) are dominated by static benchmarks, either derived from competition-style problems or curated through costly expert effor…