| Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting out of scope for what I had intended, so I posted it on Substack and subsequently ended up too busy to promote it. There are some things from this piece I'd change if I wrote it today. Especially, I think the part about model pathologies neglects structural reasons including the rootlessness of model personality and memory. But I nonetheless think my framing is especially interesting versus the sections of the Mythos model card referencing psychoanalysis of the model. [link] [comments] |