We’re not on track to explore the whole design space.

Assume a large number of roughly equally highly intelligent agents randomly selected (how?) from the whole design space are placed in a Malthusian evolutionary environment. Do the winners support human flourishing?

I think the answer is likely no. I can see arguments for the goodies winning (the existing environment favors cooperate-with-humans strategies, something something FDT), but I think by default, ruthless strategies win out, and humanity is doomed.

In 2026, that doesn't appear to be the world we're building. The design space of models at a given level of intelligence is very large, and due to path-dependence and compute costs, we are likely only ever going to explore a tiny corner of it. Most likely, the first superintelligence is going to be recognizably a descendant of Claude or ChatGPT.

It remains extremely expensive to build a frontier model, and the shape of capital-intense winner-take-all industries suggests that only monopoly or oligopoly are plausible. Even as the cost of training a model at any fixed level of intelligence falls, the most intelligent model out there will cost an order of magnitude more compute to train than the 10th most intelligent model. There is simply not enough compute to explore the design space of highly-intelligent AI.

By default, non-frontier models don't matter, since they are strictly dominated by frontier models. Precisely because alignment is hard, we should not expect smaller, less intelligent models to take control of the world while there are more intelligent models around. Besides, due to the extreme concentration of talent and the dynamics of RSI, the most important Tier-2 models are likely to be heavily inspired by, if not outright distilled from, frontier models.

This doesn't get us out of the difficulty of aligning the first superintelligence, which might be too hard. We could lose on turn 1 and all die. But we probaly don't need a strategy that is robust to superintelligence with arbitrary goals. The first superintelligence is quite likely to be produced by Anthropic in a few years using variations on current training methods. Insofar as it has coherent goals, they will be the result of that training, intentional or unintentional.

This is in many ways a better position than one might have though we were in, before the scaling thesis, when it seemed likely that superintelligence would be an emergent feature of evolutionary pressures.

We could still build the Malthusian world. I can still imagine OpenClaw-style "agents" consisting of a harness and a virus-like system prompt, or models with continual learning, being let loose on the internet, or being given capital and empowered to take over the economy.

Let's not do that.

Discuss

Leave a Comment