Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?
arXiv:2507.15707v2 Announce Type: replace
Abstract: Large Language Models (LLMs) have been evaluated using diverse question types, e.g., multiple-choice, true/false, and short/long answers. This study answers an unexplored question about the impact of…