In a recent paper, Jonathan Birch presents what he calls a Centrist Manifesto[1] for AI consciousness. He identifies two challenges:
- Millions of users will soon misattribute consciousness to AI systems on the basis of a persisting interlocutor illusion.
- Alien forms of consciousness may genuinely be present in LLMs and our current techniques for detecting consciousness are too immature to provide a confident answer.
Birch’s paper is excellent and you should read it if you haven’t had a chance. His defence of 2) is probably the most clear-headed, balanced presentation that I’ve read so far in the AI consciousness discourse.
In this article, I want to challenge his key claim that underpins 1): namely, that LLMs cannot be persisting interlocutors. Along the way, we’ll talk about personal identity and what, if anything, personal identity would mean for an LLM.
Persisting Interlocutors
At the start of Section 4, Birch writes the following:
At present, many users seem to misunderstand the true nature of their interactions with chatbots in significant ways. Chatbots generate a powerful illusion of a companion, assistant, or partner being present throughout a conversation. I call this the persisting interlocutor illusion. “Interlocutor” is just a term for the being with whom you feel you’re interacting: an assistant, a companion, a romantic partner, and so on.
To be clear, I have no doubt that users will often engage with chatbots in a way that anthropomorphises or humanises their traits. There is no Cartesian ego sitting inside the computer tapping away at the keys, even if conversations with LLMs can sometimes feel this way! But in this section Birch wants something stronger to support his conclusion. Consider the concluding sentence of the section (emphasis mine):
In short, chatbots create a persisting interlocutor illusion. There is no friend. There is no romantic partner. The illusion is often powerful, but there is no plausible theory of personal identity that could support an inference from that feeling of a persisting interlocutor to the claim that these illusory beings exist.
I read him as making a few separate claims here that are worth separating.
- People often anthropomorphise their interactions with LLMs.
- The persisting interlocutor illusion can occur even when no plausible theory of personal identity holds.
- No plausible theory of personal identity holds for LLMs.
Claim 1) is an empirical claim and strikes me as straightforwardly true. Claim 2) also strikes me as plausible. For example, consider clearly unconscious systems such as Siri, a talking child’s toy, a magic 8 ball – these systems could all plausibly trigger the illusion without being conscious participants.
The problem is with claim 3). And it’s this claim that I’ll look to debunk in the rest of the essay.
The physical criterion
Birch writes:
It is an illusion, because every step in your conversation is a separate processing event. […] It might be that one step in the conversation is processed in a data centre in Vancouver, the next in Virginia, the next in Texas. A conversation with 10 interactions might be processed by 10 different model implementations in 10 different data centres. […] For there to really be someone there throughout the conversation, this “someone” would have to be jumping between data centres at lightning speeds, always knowing in advance where your next prompt will be sent so as to be there waiting for it. That’s not what is happening.
In this passage, Birch assumes that the physical criterion holds. Roughly, that personal identity consists in the existence of enough of the same physical processes over time.
The physical criterion has striking implications. If you were to enter a teletransporter that destroys your original atoms and makes an exact replica on Mars, the criterion suggests that you shouldn’t enter the teletransporter – destruction of the original atoms which make up your brain would be as bad as death, even if your exact replica continues to live happily on Mars.
Since LLM processes are run on different parallel hardware in different data centres, it would also rule out the persistence of LLM personae over time. No single hardware instance holds all of the information about a particular conversation, so no hardware instance can be identified with the interlocutor.
The physical criterion is a plausible theory of personal identity. However, it’s not the only game in town and Birch’s claim is that no plausible theory of personal identity holds here.
Virtual entities
Birch writes:
Accordingly, when we ask, “Where is the persisting network that even might be a potential substrate for a persisting, continuous stream of consciousness in an AI companion?” we have no answer. There isn’t one. We can speak of there being a character, but that character does not correspond to any persisting entity anywhere in the world. Just as one can talk of Frodo Baggins or Wednesday Addams as characters, the partner, friend, assistant you engage with is a character, and there is no persisting entity that can be identified with that character.
Birch identifies the LLM with a fictional entity, but this is a category error. Frodo Baggins and Wednesday Addams are not analogous to LLMs. Whatever our ontology of fictional entities, they have no causal influence on us and we cannot interact with them directly. Frodo cannot speak to us or strike us with a sword because we don’t live in Middle Earth. Sure, we might find the story moving, but this is not Frodo himself making a causal impact on us — it’s Tolkien’s story.
By contrast, an LLM does causally impact us. It responds to our specific words and incorporates our ideas into its responses. This is a different category to fictional entities such as Frodo. Instead of calling LLMs fictional entities, I’d call them virtual entities. And virtual entities are no less real than entities of every day experience — they’re just real in a different way.
Consider the following argument due to David Chalmers[2]:
When searching the Amazon website, you can click on items to add them to your shopping cart. The shopping cart is displayed as a little trolley icon at the top of your screen and when clicked on displays information about what’s in the cart. The cart is implemented on multiple hardware instances around the world, just as in the case of the LLM.
Do we say the Amazon shopping cart isn’t real because it’s realised by multiple hardware instances? No. We say it’s a virtual instance of a shopping cart. There’s a persistent virtual object that we can interact with reliably. This doesn’t make it any less real even though there is no single hardware instance which realises the cart.
The same can be true for LLMs. The interlocutor is a virtual entity – not necessarily realised on a single piece of hardware, but nonetheless real and persistent over time.
Chalmers also introduces the concept of a virtual thread. Roughly, a thread is a sequence of hardware instances whereby previous conversational context is routed through subsequent instances. Chalmers suggests that if the persona is consistent enough through the thread of instances, then this can constitute a persisting interlocutor.
Parfit and relation R
Birch anticipates this move and explicitly rejects it. He identifies this with Parfitian views of personal identity.
In Parfit’s seminal work Reasons and Persons[3] he introduces what he calls Relation R.
What is important is relation R: psychological connectedness/continuity with the right kind of cause.
Parfit argues that what matters for personal identity is relation R with any cause. In his view, if there are overlapping chains of psychological links (such as memories, experiences, intentions, character traits etc.) then this is enough to preserve what matters in personal identity. This is true even if the physical state is not persistent between instances. In the teletransporter case where my body is vaporised and transported to Mars, Parfit claims that my Martian twin would have enough psychological connectedness and continuity to preserve what matters for personal identity, even though the same atoms are not present.
It’s clear, in my mind, that virtual entities like LLMs could satisfy relation R[4] over short context windows in which the output tokens depend intimately on prior context. So what does Birch have to say about this case?
Some mental states, such as beliefs and desires, could conceivably be reconstructed anew by every model instance, based on the sort of beliefs and desires an agent with that conversation history might plausibly have [...] But the mental states would still be likely to differ substantially across the gap, since the internal activation states of the model at step n greatly underdetermine the internal states at step n+1. This is the case even within the same text string on any model running stochastically, let alone across different steps in a conversation, where the implementation might switch to different hardware. That is still far too little continuity to secure personal identity on any reasonable theory.
I sent Birch an email when the paper came out which prompted him to add a footnote to this section. There’s a special case where the temperature parameter is set to zero in which activation states at step n do in fact determine internal states at n+1.
But setting aside this point, I don’t think determinism is relevant to the issue at hand. We’re asking whether the internal states at n and n+1 are sufficiently connected to each other. Does the LLM “remember” things that happened at previous time-steps? And does its character propagate persistently into future time-steps?
Consider the following thought experiment analogous to the LLM case:
- I write down 5 potential actions that I could take with probability weights e.g. eat a pizza 50%, play with my dog 20%, go for a walk 10% etc.
- A random sampler selects from the distribution
- I commit to executing whatever action the sampler selects.
If the sampler chooses “go for a walk” despite its lower probability, we wouldn’t say there’s a lack of continuity between the self who wrote the list and the self that went for the walk. In fact, there’s a direct link between the internal state that generated the options at step n and the future state which acts on them.
The model authors a set of decisions at step n. The sampler then chooses one and, although it’s true that this choice isn’t determined, the distribution itself encodes the shape of what the model wants to say. A branch of this distribution is selected and carried forward into step n+1. This implies a continuity and overlap between the two timesteps. The intent of the states at step n is propagated forward.
Birch says the following:
Think of an analogy with doctors in the UK. When I was growing up, it used to be that you had one doctor: your GP, or General Practitioner. Each time you got ill, you’d go and see the same person. Nowadays, it’s always a different person. The notes about your medical history are the only source of continuity with the previous appointment. Now imagine the doctor arguing: I know you don’t like having a different doctor at every appointment. So, I’ve started making detailed transcripts of our conversations. That way, you will have the same doctor at each appointment. My successor will receive the full transcript, and that is enough psychological continuity for them to count as the same person.
The key disanalogy with the human example is that for an LLM, the text is everything it sees. By contrast, humans have many different sensory modalities that determine their mental states beyond the notes they’re reading. For example, my mental state will differ greatly depending on whether I’d had a cup of coffee in the morning, or if I’m in a calming environment while reading. For an LLM, the conversation history + weights would be more analogous to a humans entire history of sensory experience + their brain structure.
To make the doctor analogy truly parallel to the LLM case, we’d need to reset the doctor to have the exact same physical brain state (fix weights) and arrange them to have an identical history of all sensory experience (fix conversation history). Once these are fixed, we introduce a random sampler which samples over the top X most likely next actions based on the historical sensory experience (introduce temperature.)
It’s not obvious to me that relation R wouldn’t obtain in the above scenario.
Conclusion
I think there are good reasons to believe that LLMs are persisting interlocutors if one accepts psychological views of personal identity such as Parfit’s.
Birch’s argument needs the physical criterion to be the only theory of personal identity in play, but if we permit psychological connectedness with any cause then the Parfitian move goes through. There is a respectable theory of personal identity which applies to LLMs.
- ^
Birch, Jonathan. "AI consciousness: A centrist manifesto." Manuscript at https://philpapers. org/rec/BIRACA-4 (2025).
- ^
Chalmers, David J. "What we talk to when we talk to language models." (2025).
- ^
Parfit, Derek. Reasons and Persons. Oxford University Press, 1987.
- ^
For the case of LLMs the words ‘psychological’ and ‘mental states’ could carry loaded connotations. I'll remain neutral on this question. Readers who don't grant that LLM processes are genuinely mental can substitute 'quasi-psychological connectedness' and 'quasi-mental states' throughout, where the prefix quasi indicates a loose functional analogue.
Discuss