cs.CL

Weird Generalization is Weirdly Brittle

arXiv:2604.10022v1 Announce Type: new
Abstract: Weird generalization is a phenomenon in which models fine-tuned on data from a narrow domain (e.g. insecure code) develop surprising traits that manifest even outside that domain (e.g. broad misalignment…

Scroll to Top