CQA-Eval: Designing Reliable Evaluations of Multi-paragraph Clinical QA under Resource Constraints
arXiv:2510.10415v3 Announce Type: replace-cross
Abstract: Evaluating multi-paragraph clinical question answering (QA) systems is resource-intensive and challenging: accurate judgments require medical expertise and achieving consistent human judgments …