Author name: Simon Roth

Which Leakage Types Matter?

Simon Roth / April 7, 2026

arXiv:2604.04199v1 Announce Type: new
Abstract: Twenty-eight within-subject counterfactual experiments across 2,047 tabular datasets, plus a boundary experiment on 129 temporal datasets, measuring the severity of four data leakage classes in machine l…

cs.LG

A Grammar of Machine Learning Workflows

Simon Roth / April 7, 2026

arXiv:2603.10742v3 Announce Type: replace
Abstract: Data leakage has been identified in 648 published machine learning papers across 30 scientific fields. The knowledge to prevent it exists; the tools do not enforce it. This paper presents a grammar -…