Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism
arXiv:2605.12524v1 Announce Type: cross
Abstract: We introduce ProofGrid, a benchmark suite for evaluating LLM reasoning through machine-checkable proofs rather than final answers alone. ProofGrid contains 15 tasks spanning proof writing, proof checki…