Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters
arXiv:2604.14174v2 Announce Type: replace-cross
Abstract: Alignment-tuned language models frequently suppress factual log-probabilities on politically sensitive topics despite retaining the knowledge in their hidden representations. We show that a 786…