Konstantine Arkoudas, Serafim Batzoglou

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism

Konstantine Arkoudas, Serafim Batzoglou / May 14, 2026

arXiv:2605.12524v1 Announce Type: cross
Abstract: We introduce ProofGrid, a benchmark suite for evaluating LLM reasoning through machine-checkable proofs rather than final answers alone. ProofGrid contains 15 tasks spanning proof writing, proof checki…

Author name: Konstantine Arkoudas, Serafim Batzoglou

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism