Truth as a Compression Artifact in Language Model Training
arXiv:2603.11749v3 Announce Type: replace
Abstract: Why do language models trained on contradictory data prefer correct answers? In controlled experiments with small transformers (3.5M–86M parameters), we show that this preference tracks the compress…