cs.CL

Reasoning Gets Harder for LLMs Inside A Dialogue

arXiv:2603.20133v2 Announce Type: replace
Abstract: Large Language Models (LLMs) achieve strong performance on many reasoning benchmarks, yet these evaluations typically focus on isolated tasks that differ from real-world usage in task-oriented dialog…