Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance
arXiv:2605.01699v2 Announce Type: replace
Abstract: Recent attacks show that behavioural unlearning of large language models leaves internal traces recoverable by adversarial probes. We characterise where this retention lives and show it can be surgic…