Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati / April 14, 2026

arXiv:2512.19691v3 Announce Type: replace
Abstract: Reference labels for machine-learning benchmarks are increasingly synthesized with LLM assistance, but their reliability remains underexamined. We audit MedCalc-Bench, a clinical benchmark for medica…

Author name: Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight