cs.LG

Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game

arXiv:2605.00677v1 Announce Type: new
Abstract: While Large Language Models have achieved notable success on formal mathematics benchmarks such as MiniF2F, it remains unclear whether these results stem from genuine logical reasoning or semantic patter…