Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems
arXiv:2601.15161v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) are increasingly used for clinical decision support, where hallucinations and unsafe suggestions may pose direct risks to patient safety. These risks are hard to as…