Detecting Data Contamination in Large Language Models
arXiv:2604.19561v1 Announce Type: new
Abstract: Large Language Models (LLMs) utilize large amounts of data for their training, some of which may come from copyrighted sources. Membership Inference Attacks (MIA) aim to detect those documents and whethe…