cs.AI, cs.LG

Architecture Determines Observability in Transformers

arXiv:2604.24801v1 Announce Type: new
Abstract: Autoregressive transformers make confident errors, but activation monitoring can catch them only if the model preserves an internal signal that output confidence does not expose. This preservation is det…