Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals
arXiv:2603.26829v1 Announce Type: new
Abstract: Language models detect false premises when asked directly but absorb them under conversational pressure, producing authoritative professional output built on errors they already identified. This failure …