I think we need to Stop AI. Specifically we need to Stop AI Now. We can’t wait around. The standard metaphor is a runaway train heading towards a cliff. Let’s work with that.

We don’t know when to stop. We don’t know where the cliff is.

World’s most-cited-scientist (and my Master’s supervisor) Yoshua Bengio says we’re racing into a bank of fog, and there could be a cliff. That’s about right. There are two implications of this: 1) maybe there’s no cliff and it will all be fine 2) the cliff could be anywhere, we can’t see it far enough ahead to stop unless we’re going very slow. So while a lot of people seem to think we’re going to see the risks clearly in time to stop, I’m not so sure.

The entire time I’ve been in the field, people have repeatedly been surprised by the rate of progress in AI. The people at the leading AI companies are an exception – the most vocal among them have been, if anything, overestimating how fast things move.

Progress could be sudden

There’s a dangerous idea that’s caught hold that AI progress is predictable because of “scaling laws”. We’ve seen pretty consistent patterns in how quickly AI advances in terms of particular metrics as a function of time. But there’s a few problems with this: 1) The metrics don’t measure the things we care about, 2) There’s no reason why these trends should hold if there’s a paradigm shift. Indeed, RE (2), there’s already been a major shift with the deep learning era where massively more resources are being put towards AI year-on-year than before. The rate of progress changed.

There’s no reason this can’t happen again. Indeed, I think we should expect it to happen again for at least two different reasons.

First, at some point, when AI R&D really kicks into gear, we could discover learning algorithms that work much better than today’s. I think the current AI paradigm leaves much to be desired, with major improvements, e.g. in long-term memory and efficiency. And those could arrive suddenly, and take an AI system from “really useful, but still needs a lot of hand-holding” to “we’re not sure we can stop this thing, maybe we should, um… shut off all the computers?”

Second, at some point, AI agents could really take off (we may be in the beginning of this, already), and get very good at effectively and efficiently causing things to happen in the physical realm, and could then start to rapidly and autonomously scale up the amount of physical resources (e.g. energy) directed by AI towards accelerating both AI R&D, and this process of acquiring resources and influence.

We don’t know what sort of behaviors/capabilities are dangerous.

Another dangerous trend is an increasing focus on capabilities that are obviously dangerous, such as bioweapons or cyber-attacks, to the exclusion of unknown risks.
*This RAND report is an exemplar. I previously wrote a detailed response, maybe I’ll post it soon.

The “unknown risks” argument is “When you play against a much better chess player, you know they will win, but you don’t know how”. The things you see coming, they also see coming. They do something else.

We should be worried about any system that is very smart posing a risk to us. Sometimes we can make a fairly strong case that a system lacks a particular capability, and that this makes it safe. For instance, an AI system that has only been trained to play games of Chess or Go is probably going to be safe, even if it’s an insanely good player.1

Arguments that might seem stronger than they are include:

It’s stuck in a computer, we can just unplug the computer.
Its memory is wiped after every interaction, it would struggle to make and execute coherent long-term plans.

The problem with both arguments is that they assume that the AI cannot use its influence over the external world to acquire new capabilities. For instance, a smart AI that notices it is limited by such things could pay people to help give it a robot body or a better external memory, or trick them into it.

For those familiar with it, Pickle Rick is a nice fictional example of an intelligent system using external resources to overcome its initial limitations.

In general, it’s hard to know what to make of a system that is clearly really smart, and not fully understood. A lot of experts (Yann Lecun, Gary Marcus, …) claim that current approaches to AI are fundamentally limited, but this is just them stating their opinion, which many other experts disagree with. The reality is we just don’t know.

But even if the system is fundamentally limited in some way, it could still cause massive risks. For instance, lacking a sense of smell probably wouldn’t stop an otherwise super intelligent AI from taking over the world if it wanted to.

It takes time to slow down. The train doesn’t stop when we slam on the brakes.

What needs to happen, once “we” decide to stop? A rough list I have in my head is:

The US government decides to stop AI, and starts trying to broker an agreement with China and maybe a few other key players.
The US, China, etc. reach an agreement on how to stop AI.
The rest of the world gets on board with this agreement.

I expect these steps to take time, quite likely a lot of time. How do you actually stop AI? I have an answer, but there are still a lot of details to be worked out, and I don’t think we’ll really know the answer to this question until world powers actually start prioritizing this issue and are willing to make major sacrifices and compromises to achieve it.

A unilateral pause in the US could be implemented faster (but would still require navigating the politics of the thing, which could take arbitrarily long), and to be fair, I think this is what many people imagine a “pause” looking like: frontier AI companies suddenly cease their R&D operations; they send their researchers off on vacation, and stop their big training runs. And the US is ahead right now, so China wouldn’t immediately race ahead. How quickly might they catch up? Three considerations are: 1) How hard are they racing? 2) How far behind are they? 3) How reliant are they on copying US companies to make progress?

The problem with a unilateral pause is that it expires. You get a few months -- or a few years, if you’re lucky -- to figure things out, and then we’re off to the races again. But we can’t count on figuring things out in that amount of time! We don’t even know what we need to figure out. “Solving alignment” (as popularly conceived) may not be enough.

It’s getting harder to stop. The brakes are fraying.

It’s getting harder and harder every day to stop. Every day we wait, AI companies get richer. AI gets more embedded into society and infrastructure. AI gets smarter. AI research advances. More people get AI psychosis. More AI computer chips are built. More datacenters.

There’s a sense in which AI is already out-of-control.2 The very people building it have repeatedly expressed concern, fear, anxiety, dread, apprehension, etc. about the risks it brings. They say they would like to slow down, if they could. Elon Musk says he has AI nightmares. They don’t seem to feel like they’re in control. As easy as it might seem for AI CEOs to say “oh, damn, this model really is dangerous, we’d better pause”, it’s not clear they will be able to at the critical moment. Maybe a dangerous AI has already escaped human control. But more generally, I am concerned that we will increasingly lack a “nexus” of control at all.

“Racing to the cliff” is not a good strategy.

Despite the risks, a lot of people I talk to in the AI Safety community think that we should keep building more powerful AI until it’s more clear that it’s getting too dangerous. One argument is: Until AI literally kills everyone, it’s basically just great; let’s keep getting all the benefits for as long as we can. This is sort of mostly a vibes-y thing. These people are “pro-technology” and really, really don’t want to be mistaken for Luddites.

The more significant argument is this: the whole point of pausing is to do more safety/alignment research, and we can make the most use of wall-clock time when we have more advanced models to study, and to use to help us with the research. This is clearly a 12-D chess move. In order for it to work, we’d need to know where the precipice is, and we’d need to slam the breaks before we get there, and we’d need to make sure that nobody cuts the breaks in the meantime. I’m not optimistic about any of those things working out, but the plan requires all three of them to. I say: slam the breaks now, while we still can, and we just have to hope that we can stop soon enough.

Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.

Even such a limited system might in principle be able to discover things about the world outside the game and want to gain influence over it -- this would be a bit like the plot of “The Matrix”, but in reverse. This is an area where we do have some uncertainty, but where I’m comfortable saying “I don’t think we need to worry about that yet”.

See number five of Ten different ways of thinking about Gradual Disempowerment.

Discuss