Michael J. Clark - Provide.ai

AntiPaSTO: Self-Supervised Honesty Steering via Anti-Parallel Representations

Michael J. Clark / April 21, 2026

arXiv:2601.07473v4 Announce Type: replace
Abstract: As models grow more capable, humans cannot reliably verify what they say. Scalable steering requires methods that are internal, self-supervised, and transfer out-of-distribution; existing methods sat…

Author name: Michael J. Clark

AntiPaSTO: Self-Supervised Honesty Steering via Anti-Parallel Representations