Anthropic spent too much don’t-be-annoying capital on Mythos

I have seen a lot of coverage from reasonable people suggesting that Claude’s new model, Mythos, is a vehicle for Anthropic to peddle hype and doom in order to raise money. While some of this is necessarily motivated by people’s unwillingness to stare into the abyss of our AI future, the breadth of otherwise-reasonable people who have voiced these kinds of cynical opinions suggests that part of the blame rests on Anthropic.

In this post, I want to briefly unpack the way people misinterpreted the evidence, their valid reasons for doing so, and what Anthropic (and the AI safety community more broadly) should learn from this.

In particular, since we will inevitably see other dangerous capabilities spontaneously emerge in the future, we need protocols for how to announce them effectively. To this end, I try to make the following points:

We should be mindful of the public’s sympathy for AI safety prophecies and orient ourselves towards growing this resource rather than expending it. I call this our don't-be-annoying capital.
People have good reasons to be skeptical of Anthropic’s claims. Overcoming this requires a particularly high burden of evidence.
We should acknowledge that Anthropic has a conflict of interest when presenting doomer perspectives, and we should account for this going forward.

1. Mythos criticisms & missing the point

In one podcast from The Guardian, the reporter says that “this Mythos debacle [has led to] heads of state saying ‘this is so dangerous, it could shred our infrastructure, the end of civilization is nigh!’”. She then says that “accepting the companies’ premise that they are creating a machine god” is helping these companies sell their product.

In a different podcast, Cal Newport (a professor of computer science at Georgetown University!) concludes his analysis of Mythos by saying “it was wrong for Mythos to get the amount of dread-coverage that it got; so far we do not have evidence that it represents a significantly larger leap in detecting or exploiting vulnerabilities than we’ve seen in previous releases.” He has since gone on other podcasts to make these points.

Similarly, the YouTube channel Internet of Bugs posted a video titled “Anthropic’s $x00 Million Marketing Stunt” saying that “we’ve been seeing a lot of this particular attention-grabbing technique lately. I’m sure it’s going to only be getting worse as the AI companies get more desperate to keep up the investment dollars flowing. How are we supposed to believe this shit?”

To be very clear, the point is not only that Anthropic have a model which is good at cybersecurity tasks. The point is that scaling laws are holding and that the inevitable acceleration continues. With each new model, we unlock new and mysterious risks which we have to grapple with. In this case, it was cybersecurity, but it could have realistically been anything. This bigger story was essentially lost amid the hoopla.

2. It feels like people wanted to miss the point?

I understand the instinct to say a company is hype-mongering when it says it has a big scary thing. E.g., Sam Altman’s tweet of the death star before GPT-5. But I am surprised at how much people are willing to focus on evidence to support their prior that Anthropic is just engaging in corporate shenanigans.

For example, there’s this post from an LLMs-for-cybersecurity org saying that other, smaller models were able to find the bugs that Mythos found. They write “we took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.” This was retweeted by the HuggingFace CEO and, subsequently, used as evidence by all three of the above podcasts/videos.

Am I going crazy? Isn’t this pretty bad evidence towards dismissing all the claims Anthropic is making? As said by others, it’s the equivalent of isolating the clump of haystack with a needle in it, giving this clump to a small child, and then saying “wow they were able to find the needle too”. The point is that locating that clump is the hard part!

3. There must be a lesson here.

If we ignore the whole “accelerating us towards the machine god” thing, I think that Anthropic behaved responsibly with Mythos. I also think there are lessons to be learned regarding the publicity, as evidenced by the fact that reasonable people consistently missed the point.

Anthropic is in the business of making extremely unpopular things which they claim could ruin society as we know it. They also repeatedly say it is too dangerous for the public to have access to these things (I note that caution seems generally warranted).

It is reasonable for people to think Anthropic is crying wolf, especially given that they are on track to have the highest IPO of all time and got there, in part, via prophecies.

Unfortunately, I am in the camp of people who think the prophecies are true. This means that, for me, evidence which supports Anthropic’s worldview exists in superposition: it is both hype and responsible, both doomer and appropriate.

But as we’ve seen, it is difficult to convince other people that this is all really happening. They will (with good reason) consider this self-serving and they will (with good reason) not want to confront reality. If people perceive us as crying wolf, they will grow weary of our frenetic anxieties and our cause will go the way of pandemic-preparedness.

Consequently, I think of AI safety as operating with some amount of don’t-be-annoying capital^[1]. This is the amount of sympathy the general public has towards our concerns. It is amassed when AI causes things to go wrong (in the public’s consciousness, not in ours). It is expended when we make claims that people don’t want to believe. It is also expended when AI companies make claims which are plausibly self-serving. It is expended even faster when these claims are poorly substantiated.

I argue we should frame public interactions around trying to grow this resource.

4. Some options for next time

Let’s put ourselves in Anthropic’s shoes: you just made your latest digital nuke. You have to do something with it. What are your options?

You could sit on it quietly. This is obviously terrible, especially if it leaks.
You could release it publicly. This runs the risk of being catastrophic.
You could announce it and route it to defensive/government use. This is what they did with Mythos.
You could announce that you were able to produce it but then destroy it after using it to harden infrastructure. This is likely the most altruistic option, but then you wouldn’t be able to use it to gain an accelerative advantage over your competitors.
You could delete it and never tell anyone. This could also be a PR disaster if it is leaked.

I have the sense that you need to announce it somehow. But, if you announce it, you likely expend capital. If the thing you announce does not live up to the hype, you expend even more capital. And we’ve seen that reasonable people will look for unreasonable excuses to dismiss your claims, draining your capital further.

To this end, Anthropic should have done more due diligence with the Mythos release: their model card (while thorough) did not have the rigor of a comprehensive scientific study. The cybersecurity assessment takes up only 6 pages of the otherwise 200+ page-long system card (pages 46-52); it includes four experiments where they only compare to Anthropic-line models. Extraordinary claims require extraordinary evidence. For example, they could have evaluated other, non-Anthropic models across the capabilities spectrum to verify that Mythos is uniquely able to do these tasks. They could have run some controls.

I also believe Anthropic should be more up-front about their conflict of interest with regards to making statements of doom. For as good as I believe their intentions here to be, this conflict of interest is real and people are right to perceive it. One simple option to avoid these perceptions would be to prioritize the analysis coming from independent non-profit evaluators. Credit where it’s due: Anthropic did have the UK AISI provide support for Mythos's capabilities. Across the cynical takes I’ve seen, this analysis was treated with more respect. Of course, even this runs into cynicism, since people will start to think the non-profits are in cahoots with the companies, as evidenced by the comments on the recent NYT profile of METR:

Finally, although random dangerous capabilities will emerge, the point is not any one specific capability. I think letting the narrative get oriented towards the cybersecurity elements of Mythos did a disservice to the public: I doubt most people internalized how big the overall capabilities jump here was, nor that the next such jump will bring new harms into view.

I’m also sure there are considerations which I am not privy to, and recognize that it’s easy to criticize from my cozy corner where nothing I do moves the stock market.

Nonetheless, Anthropic’s first-mover advantage on identifying dangerous capabilities also endows them with a first-doomer responsibility.

thanks to Erin, Steven, Justin, Joseph and Li-lian for comments.

^{^}
I call it 'don't-be-annoying capital' because that's the lived experience on the receiving end and I think it is good to think about this from the perspective of the audience. I'll admit that something like 'warning fatigue' might be a more representative name.

Discuss

1. Mythos criticisms & missing the point

2. It feels like people wanted to miss the point?

3. There must be a lesson here.

4. Some options for next time

Leave a Comment