cs.AI, cs.CL

Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering

arXiv:2505.12189v3 Announce Type: replace-cross
Abstract: Large language models (LLMs) exhibit reasoning biases, often conflating content plausibility with formal logical validity. This can lead to wrong inferences in critical domains, where plausible…