cs.CL

Counting as a minimal probe of language model reliability

arXiv:2605.02028v1 Announce Type: new
Abstract: Large language models perform strongly on benchmarks in mathematical reasoning, coding and document analysis, suggesting a broad ability to follow instructions. However, it remains unclear whether such s…