cs.AI, cs.CL

Data Compressibility Quantifies LLM Memorization

arXiv:2507.06056v4 Announce Type: replace
Abstract: Large Language Models (LLMs) are known to memorize portions of their training data, sometimes even reproduce content verbatim when prompted appropriately. Despite substantial interest, existing LLM m…