Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych / May 12, 2026

arXiv:2404.18923v5 Announce Type: replace
Abstract: We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence – their unconscious understanding of linguistic phenomena. Specifically, we use classifier-based pr…

Author name: Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models