A study published in Science by researchers at Harvard Medical School and Beth Israel Deaconess Medical Center found that OpenAI’s o1 model outperformed internal medicine physicians in emergency room diagnostic accuracy, correctly identifying the exact or near diagnosis in 67% of triage cases compared to 55% and 50% for the two attending physicians assessed.
Lead author Arjun Manrai said the AI surpassed both prior models and physician benchmarks across nearly every test. The researchers stressed the models received identical information to what physicians saw in electronic records, with no preprocessing.
However, the study stopped well short of advocating clinical deployment, calling instead for formal prospective trials. Critics, including emergency physician Kristen Panthagani, cautioned that comparing AI to non-specialist physicians and equating diagnostic guessing with genuine emergency care represented a significant methodological limitation.