Beyond Public Access in LLM Pre-Training Data
arXiv:2505.00020v2 Announce Type: replace
Abstract: Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models show recognition …