Nathaniel Oh, Paul Attie

Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

Nathaniel Oh, Paul Attie / March 31, 2026

arXiv:2603.26829v1 Announce Type: new
Abstract: Language models detect false premises when asked directly but absorb them under conversational pressure, producing authoritative professional output built on errors they already identified. This failure …

Author name: Nathaniel Oh, Paul Attie

Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals