AGI is Probably Inevitable: A Model of Societal Ruptures

Introduction

I posted a quick take a few days back claiming AI moratoriums won't work, and people seemed to really disagree with it. I honestly thought it was a commonly held opinion, but the more I engage with LessWrong, the more I realize the way I'm approaching things seems, at least to me, different.

So let me elaborate with a refined approach and a proper introduction. The problem of constraining AI development is a social problem and the way I believe we should treat society is as a sociophysical system determined by sets of natural societal laws. Society isn't a collection of people who collectively control what society does. Rather a society is a collection of people with qualified collective influence over society. My point summed up is there are things as a society which we simply can't do. Just like Jupiter can't suddenly start orbiting backwards due to the laws of physics, we can't change society in a manner that breaks the sociophysical dynamics of the system.

Think of this analogy: electrons are to physical systems as humans are to social systems. Electrons can behave very differently than how their encompassing physical system behaves. If we let the electron be part of say, a metal teaspoon, then if you throw the teaspoon against a wall it's going to collide and bounce off, probably with a loud jarring noise. But the electron, being a quantum particle, has a not insignificant probability of quantum tunneling through a thin wall. This is because electrons exist as complex probability clouds. The teaspoon can't quantum tunnel through the wall, it's restrained by physical laws of motion which exist at the macroscopic scale only.

Its the same with society. Humans are like electrons, existing with completely different governing laws than the larger object they compose. Assuming a simple relationship of humans being "in control", or even assuming some kind of human-society feedback loop, misses the point. These two things exist in different worlds altogether. Some macroscopic properties of society are not influenceable, as they exist as properties of what a society is.

An Ad-Hoc Model

Now let me get my hands dirty and justify my position on AGI moratoriums with a model of my own design created to get my point across formally. I worked on this over the course of maybe 2 days, so there's bound to be a mistake somewhere.

Generalizing, define a irreversible social rupture as a society-level event which causes a disregard of past social norms, laws, and customs. It is an event from which there is no going back. If such an event happens society is altered forever. There are endless examples:

Alexander The Great's conquest of the mid-east
Suez Canal Crisis in 1956
The Invention of Gunpowder
AGI
Leibniz creating calculus (his notation is better so fight me Newton fans)
etc.

Each of these events caused society to shift so significantly that returning to society pre-event was impossible. These are akin to irreversible processes from physics. You can't unbreak an egg just like you can't give the British back their empire.

Further, these events were instantiated sometimes by individuals, but most often by organizations. This is the viewpoint I will take for this model, using organizations as the atom, instead of individual people. Organizations are collections of people with clearly defined goals and methods, such that they persist beyond the individual members that make them up. An organization can still exist 100 years later, even if everyone who founded it is long dead.

Laying the Groundwork

Consider , the probability of a socially disruptive event occurring. Let there be organizations each with a probability of causing a social rupture . Then

We can break down further into where is the probability the organization pursues a social rupture and is the organizations inherent ability to cause a rupture should they pursue it.

Now let each organization choose to pursue social ruptures via an expected reward function such that,

Where

is the reward for causing a social rupture
is the cost of attempting and failing to cause a social rupture
is the expected reward/cost from allowing another organization to cause the rupture first

Note

You can picture as a hollow matrix encoding who backstabs who in the event of a social rupture. Negative symmetric entires for enemies, positive for allies, skew symmetric if you want to get weird.

Then with respect to is

And we get

Likewise

We can now analyze dynamics. Assume we have an organization who is pursuing a social rupture. Then it's . Breaking this down:

And the problem reveals itself: as an organization grows in ability the cost needed to get them to not pursue social disruption goes to infinity. Once the organization is committed and beyond a threshold of capability it becomes impossible to stop the social rupture from occurring.

An Example: Competition vs Cooperation

Lets consider two societies filled with organizations. Society A is one of ruthless competition, with the only way to benefit being to step on others. Society B is the exact opposite. These two configurations control the sign of

with organizations in society A having for all and . Similarly let society B have . Thus for an organization in society A it can expect a social rupture to hurt if not done by itself, while an organization in society B can expect a social rupture caused by anyone to benefit everyone.

Take organization in society . We want to know what will cause this organization to flip from 0 to 1 or vice-versa. So first, assume the organization is inactive, . Then

becomes

Since by definition this implies . Solving and we see that i.e. the cost outweighs the risk of letting others develop social ruptures. Similar for the reverse. If then the risk outweighs the cost, and the organization pursues social ruptures.

For society it's flipped however. In the case where an organization does not pursue social ruptures, trivially as the right hand side of the inequality is negative. Thus even with no cost to pursuing social ruputes the system remains inate. The organization has no motivation. For the reverse, when an organization in society is pursing a social rupture, only works if i.e. the pursuit of a social rupture has negative cost. In society you need to subsidize the development of social ruptures.

Note On Future Work

I'm going to keep playing around with the model to see what falls out. If I find anything striking I'll make another post documenting more examples.

Conclusion

We live in a mixed society, where some organizations experience net cooperation and others net competition. However when we look at the reality of the few large AI companies in a zero-sum race, it seems we have subsocieties that look a lot like society A from the previous example. The incentive to reach AGI first is massive, and some AI companies have already demonstrated cut-throat business practices, indicating they experience net competition. From this the model paints a dark picture depending on how close we actually are to an AI system which can cause an irreversible society-level event.

Now: AGI moratoriums. If I think they are impossible at this point then what do we do? We follow the sociophysical dynamics towards a branching point at which point we as humans do have the ability to affect the outcome. Think of it like a high entropy partition of the societal configuration space: the system is at an unstable point and can go in different directions. So, we lean into AGI research, hard, but take different approaches. LLM control is a dead end, we need a shift in perspective that recognizes we can't control LLMs as they are before one is created that can cause a society-level event from which there is no going back (looking at you Mythos). Instead divert funding towards, and in my opinion further subsidize, AGI research that promotes interpretability first. This maintains and strengthens the flow of capital, preventing economic collapse, while giving us a better shot at not going extinct.

We have to lean into the physics, not fight it.

Let me know if I made a mistake or a wrong turn somewhere. I'm open to changing my mind as well.

Discuss