Sruly Rosenblat, Tim O'Reilly, Ilan Strauss

Beyond Public Access in LLM Pre-Training Data

Sruly Rosenblat, Tim O'Reilly, Ilan Strauss / May 7, 2026

arXiv:2505.00020v2 Announce Type: replace
Abstract: Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models show recognition …

Author name: Sruly Rosenblat, Tim O'Reilly, Ilan Strauss

Beyond Public Access in LLM Pre-Training Data