cs.AI, cs.CL

Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?

arXiv:2507.15707v2 Announce Type: replace
Abstract: Large Language Models (LLMs) have been evaluated using diverse question types, e.g., multiple-choice, true/false, and short/long answers. This study answers an unexplored question about the impact of…