cs.AI, cs.CL

Hey, That’s My Data! Token-Only Dataset Inference in Large Language Models

arXiv:2506.06057v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inf…