cs.CL

How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling

arXiv:2604.04791v1 Announce Type: new
Abstract: Large language models (LLMs) have achieved strong performance on reasoning benchmarks, yet their ability to solve real-world problems requiring end-to-end workflows remains unclear. Mathematical modeling…