cs.AI, cs.CL

Beyond Public Access in LLM Pre-Training Data

arXiv:2505.00020v2 Announce Type: replace
Abstract: Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models show recognition …