Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks
arXiv:2605.01417v1 Announce Type: cross
Abstract: Evaluating large language models (LLMs) for medical applications remains challenging due to benchmark saturation, limited data accessibility, and insufficient coverage of relevant tasks. Existing suite…