Lech Madeyski, Barbara Kitchenham, Martin Shepperd

LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews

Lech Madeyski, Barbara Kitchenham, Martin Shepperd / April 28, 2026

arXiv:2511.12635v2 Announce Type: replace-cross
Abstract: Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under …

Author name: Lech Madeyski, Barbara Kitchenham, Martin Shepperd

LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews