Learning the Signature of Memorization in Autoregressive Language Models
arXiv:2604.03199v1 Announce Type: cross
Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer’s intuition. We…