MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts
arXiv:2604.06505v1 Announce Type: cross
Abstract: Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific conclusions from structured biomedical evidence rema…