What can LLMs tell us about the mechanisms behind polarity illusions in humans? Experiments across model scales and training steps
arXiv:2603.27855v1 Announce Type: new
Abstract: I use the Pythia scaling suite (Biderman et al. 2023) to investigate if and how two well-known polarity illusions, the NPI illusion and the depth charge illusion, arise in LLMs. The NPI illusion becomes …