Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering
arXiv:2505.12189v3 Announce Type: replace-cross
Abstract: Large language models (LLMs) exhibit reasoning biases, often conflating content plausibility with formal logical validity. This can lead to wrong inferences in critical domains, where plausible…