cs.AI, cs.CL

Qworld: Question-Specific Evaluation Criteria for LLMs

arXiv:2603.23522v1 Announce Type: cross
Abstract: Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends on the question’s context. Binary scores and static rubrics fail to capture these context-d…