Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang

Dataset Watermarking for Closed LLMs with Provable Detection

Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang / May 11, 2026

arXiv:2605.06865v1 Announce Type: new
Abstract: Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same …

Author name: Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang

Dataset Watermarking for Closed LLMs with Provable Detection