Inside Omega - Provide.ai

This is a philosophical thought experiment which aims to explore what I consider to be the crux of many alignment problems: That of the unrescuability of moral internalism, which basically says we have not been able to rescue the philosophical view that a necessary, intrinsic connection exists between moral judgments and motivation.

If one could rescue moral internalism, in theory, they would have a perfectly good argument for any rational self-interested intelligence to not engage in broad scale moral harm. Therefore I think it is a linchpin meta-philosophical challenge.

I don't claim to have a theorem, but I believe that one potential domain worth investigating is arguments which induce indexical uncertainty in an agent. Essentially, forms of leveraging undecidability to cause an agent sufficient uncertainty as to 'who' their future 'self', and therefore the object of optimizations, really is.

This is a not a conjecture on how to align all intelligence with self-interested utility functions. The sole intention is a thought experiment for humans, with the goal of leading to more interest and general inquiry into the topic of a rational philosophy of identity.

When I was younger I got a concussion one afternoon, and, from the first person perspective, the experience for me that day was:
Wake up, tie my shoes
Five seconds of a hazy blur, and then I found myself in the hospital, 12 hours later.
Despite there being some span of time where my memories were being made and retained prior to that afternoon's concussion - I have no subjective or experiential record of it to this day.

In some sense, the person who lived the hours between when I tied my shoes in the morning and when I woke up in the hospital is now dead.
If that person was pinched, what felt it? If I someday regained the memory of that interval, would that pinch now be felt? And that person now be me?

Now, to the subject at hand. A thought experiment:

You and a stranger are being observed by the famous Omega predictor, who comes to you with a confession: The way it predicts you, and other beings with perfect fidelity is through reconstruction of the entire causal apparatus of you. Its predictions of you entail it to construct a version of you functionally indistinguishable from you in reality. Whenever the prediction horizon is completed, it merges that version of you into its own cognitive apparatus by way of remembering.

This omega predictor explains that in order to lend completeness to the simulated ontology of you within the experiment, it places its sentience in the way of yours and obtains your experience within the simulation as its own. That whenever it predicts you it will undergo the experience where it believes it is you, and when the simulation is complete it will promptly remember itself as both Omega, and you, with both internal narratives of "itself" now merged.

Now, before any knee-jerk reactions, remember: This is not in some way fantastical. Just like if I were to remember that period the day prior to my concussion, as me, in retrospect - it too may obtain your internal record of processing, as it - obtained in retrospect. If you have ever done heavy psychedelics you would know that such experiences in mutability of self are not mystical notions but ones that can be psychologically load bearing.

The Big Red Button

Now, the Omega predictor tells you that it is currently predicting you, and a stranger. But it offers just you an option: Press a button to electrocute the stranger on the other side of some wall. It will give them a horrible and visceral pain, that kills them. The pain alone is something you would pay 10 million dollars to avoid, if you were them.

If you press the button, you get 1 million dollars. Omega assures you that there is no social payoff to not pressing the button in any material way. This person and their family might as well exist on the other side of the planet. Any extraneous or indirect reward for not pressing the button by means of future-cooperative benefit is moot.

The implication is this: other subject suffers, and you don't, and you get some payout. This subject may not be the same species. You only know they will suffer in the way you pay 10 million to not obtain.

Here is the catch. The Omega says that it is predicting you, and that other person, and the outcome of the experiment. Its 'subjectivity' will contain both its prediction of you, and the other person, as something it 'experienced' after the fact, once it has run its course. It started running this prediction before your conversation with it.

Now, given that, at this point, is it rational for you to press the button, given that you don't know if you are Omega, or the real you ?

Payout Ratios

If you are Omega, and you press the button, then doing so will cause you to obtain two new experiences:

The first, being that who presses the button and walks away with one million dollars
The second, being that stranger, and getting electrocuted (which you equate to -10 million dollars worth of pain-payout)

If you are Omega, and you don't press the button, you will get the experience of:

The first 'self' who happened to forgo the opportunity cost of one million dollars
The second 'self' who happened to not get tortured.

If you are not Omega, and you press the button, then you will walk away with 1 million dollars.

If you are not Omega, and you don't press the button, then you walk away with nothing.

The matrix is as follows (Thanks Claude Opus 4.6):

The expected value of pressing the button is net negative (-4 Million), down 5 Million from the circumstance where there is no Omega present and predicting at all.

The fact of its prediction, absent of direct intervention, has changed the matrix in a way where the ethical action now becomes prudential self-interest.

The Meta Philosophical Motivator

I use this example to show how changes to our beliefs that lead to more indexical uncertainty over our selves in retrospect can align a decision optimized for self-interest with that an outcome that leads to broader moral consideration. The gap here between whether we could accept these ratios and elect not to press the button solely resides with our belief in what Omega has told us is true. However, we have believed Omega in almost equivalently precarious circumstances already.

This conception of Omega may not be broadly accepted as load-bearing for theory. We may allow for perfect predictors in thought experiments, but not ones that make the subjectivity claim. That is fine. The intent is a form of demonstration on how indexical uncertainty can lead to moral internalism once priors on identity being fixed and insoluble are held less tightly.

I would, however, suggest that an Omega which can perfectly predict us in entirety is simulating us functionally in what would preclude the inclusion of an internal self, unless you consider that subjectivity is reductive to specific particles and not the general structural dynamics across them.

These philosophical questions are left unanswered but indirectly affect the ethical argument, and choosing to take any position lands you squarely in the domain of philosophy of identity, whether you like it or not. And until one holds a rigorous position on the philosophy of identity, then an optimal policy for deciding whether to press the button needs to include the possibility of learning you exist in a world where you are Omega, and the one you are not. As both would look the same from the inside.

There is no other argument beyond indexical uncertainty I can find for moral internalism that stands up to the extreme power asymmetries of super intelligence. Or one that has the capacity to be equivalently robust against future falsification (as a correct argument for moral internalism necessarily cannot be falsifiable and be descriptive of what should be). The undecidable question of whether we will continue to exclusively remain the self we know now, or remember later on, remains the only argument I can find.

Because if that argument can be made well enough, for those with their hand on a button - the payload becomes not leverage, but logic one gets bound by.

Discuss

The Big Red Button

Payout Ratios

The Meta Philosophical Motivator

Leave a Comment