Methuselah - Provide.ai

Uncategorised

Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms

Methuselah / April 8, 2026

Intro: I made a small adapter (~4.7M parameters) that sits on top of a frozen Phi-2 model and forces it through two mathematically opposing attention mechanisms. The result initially was that it generalizes past its sparse training, sometimes into surp…

Author name: Methuselah

Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms