What do your logits know? (The answer may surprise you!)
arXiv:2604.09885v1 Announce Type: new
Abstract: Recent work has shown that probing model internals can reveal a wealth of information not apparent from the model generations. This poses the risk of unintentional or malicious information leakage, where…