cs.LG

Dataset Watermarking for Closed LLMs with Provable Detection

arXiv:2605.06865v1 Announce Type: new
Abstract: Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same …