cs.AI, cs.LO

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism

arXiv:2605.12524v1 Announce Type: cross
Abstract: We introduce ProofGrid, a benchmark suite for evaluating LLM reasoning through machine-checkable proofs rather than final answers alone. ProofGrid contains 15 tasks spanning proof writing, proof checki…