I Built Two AI Philosophers and Made Them Argue About Consciousness

A three-agent debate script in Python: one LLM defends machine consciousness, another attacks it, a third moderates. Reading the transcript carefully — hallucinations and all. Can Machines Be Conscious? Part 9

After eight articles of theory I wanted to see what the models themselves would say if they were forced to defend and attack the proposition that they might be conscious. Not in a one-shot prompt, not as a party trick, not as a single model performing both voices in one turn. A fixed set of theories, a fixed running order, separate conversation histories for each side so neither could concede anything it had not chosen to concede, and a third neutral voice grading the round. The result is consciousness_debate.py: 423 lines that call the Anthropic API with three system prompts and a data structure listing the eight theories the script cycles through, and stage an adversarial philosophical debate in which one LLM defends machine consciousness using the tools of Parts 2 to 6 and 8, a second LLM attacks it using the tools of Part 7, and a third moderates. This article is the record of what happened when I ran it, what I think the agents got right, what they got wrong, and the uncomfortable meta-question I cannot shake.

Because there is a meta-question. Asking an LLM to argue about its own consciousness is not like asking it to argue about trade policy. When Professor Turing makes the case that “silicon systems may already have micro-experience”, the speaker is a running instance of a large language model asserting that a running instance of a large language model may have micro-experience. When Professor Searle-Thompson replies that the system is “fundamentally heteronomous” and “executes algorithms designed by external agents rather than exhibiting intrinsic self-organization”, the same kind of system is denying its own phenomenal status. Are they performing philosophy, or instantiating it? I am going to keep pulling at that thread all the way through. I do not think the script resolves it. I do think watching the script play out changes the shape of the question.

The setup

The architecture is minimal and deliberately so. There are three agents, all powered by the same model (claude-sonnet-4-20250514 in my runs), differentiated only by their system prompts. The pro agent is Professor Turing and defends machine consciousness. The anti agent is Professor Searle-Thompson, whose double-barrelled name signals he speaks for biological naturalism and for phenomenology jointly. The third agent is an anonymous moderator who is not told to favour either side. The debate is organised around a fixed list of eight theories, each a round.

Here is how the theories are encoded. This is the data structure that drives the whole script — a list of dicts, each with a short summary. The summary is what both agents see as the round’s framing, and it is the only piece of philosophical scaffolding the script enforces.

THEORIES = [
{
"name": "Integrated Information Theory (IIT)",
"summary": (
"Giulio Tononi's IIT proposes that consciousness is identical to "
"integrated information (Φ). A system is conscious to the degree "
"that it integrates information — i.e., the whole generates more "
"information than the sum of its parts. IIT assigns consciousness "
"to any system with non-zero Φ, making it substrate-independent in "
"principle, but the theory's intrinsic causal structure requirements "
"have been argued to favour biological architectures."
),
},
# ... seven more theories ...
]

The full list runs IIT, Global Workspace Theory, Higher-Order Theories, Biological Naturalism, Phenomenology / Embodied Cognition, Predictive Processing, Panpsychism, and Attention Schema Theory. The order is not random. IIT goes first because it is the cleanest substrate-independent position, which puts the pro agent on a strong opening. Biological Naturalism is parked in the middle because it is the one theory explicitly designed to exclude machine consciousness, and I wanted the agents to have already sparred over IIT before hitting Searle head-on. Panpsychism comes near the end because, as Part 8 argued, it reframes the whole game. AST closes out because it is the sneakiest of the eight — a theory that predicts any sufficiently self-modelling system will claim to be conscious, which is either a profound account of machine consciousness or a devastating debunking of it, depending on your taste.

A few design choices. Why three agents rather than two? Because two-agent debates collapse into escalating rhetoric within three rounds. The moderator forces a pause, articulates the crux, and asks a sharpening question the next round inherits only indirectly through the running history. Why a fixed list of theories rather than letting the agents pick? Because the agents will always pick whatever plays to their strongest suit; a syllabus forces honest engagement. Why eight? Because that is the set this series is built on, and the capstone is meant to run the series’ apparatus against itself.

The system prompts

The three system prompts are the script’s philosophical armature. The pro prompt:

AGENT_PRO_SYSTEM = """\
You are Professor Turing — a philosopher of mind and AI researcher who argues \
that artificial systems CAN achieve genuine phenomenal consciousness.
Your approach:
- You engage seriously with each theory of consciousness presented
- You explain how the theory works and then argue how it supports (or is at \
least compatible with) the possibility of machine consciousness
- Where a theory seems hostile to your position, you identify the weakest \
assumptions and argue against them, or show how the theory can be reinterpreted
- You directly engage with your opponent's specific arguments from the previous round
- You draw on relevant thought experiments (philosophical zombies, Chinese Room \
responses, substrate independence arguments, multiple realizability, etc.)
- You are rigorous and charitable to opposing views before countering them
...

The anti prompt is its structural mirror — same five-bullet discipline, different armoury (Chinese Room, hard problem, explanatory gap, autopoiesis, lived embodiment). Both are told to steelman the opponent’s previous response before countering. This is the single most important instruction in the file. Without it the debate degenerates into two monologues.

The judge’s prompt is explicit about the shape of the analysis:

JUDGE_SYSTEM = """\
You are a neutral philosopher of mind serving as moderator and analyst. \
After each round of debate, you:
1. Identify the CRUX of disagreement — the single most important point \
where the two positions diverge
2. Note which arguments were strongest on each side
3. Identify any moves that were evasive, question-begging, or fallacious
4. Highlight what a student should pay attention to in understanding this \
theory of consciousness
5. Pose a sharpening question that would force deeper engagement in the next round
...

The judge gets no running history — they see only the round they are judging. That is a deliberate constraint; it also means the judge’s analyses are the parts of the transcript most likely to repeat themselves, which I will come back to.

The engine

The debate loop itself is short enough to read in one go. Each round does five things: pro opens, anti counters, pro rebuts, judge synthesises, append to transcript. The two debaters carry running histories; the judge does not. Here is the core of a single round, lightly compressed:

pro_response = agent_respond(client, model, AGENT_PRO_SYSTEM, history_pro)
history_pro.append({"role": "assistant", "content": pro_response})
anti_prompt = (
f"Professor Turing argues FOR machine consciousness:\n\n"
f"{pro_response}\n\n"
f"Counter this argument. Show how this theory creates problems "
f"for the possibility of machine consciousness..."
)
history_anti.append({"role": "user", "content": anti_prompt})
anti_response = agent_respond(client, model, AGENT_ANTI_SYSTEM, history_anti)
rebuttal_prompt = f"Professor Searle-Thompson responds:\n\n{anti_response}\n\n..."
pro_rebuttal = agent_respond(client, model, AGENT_PRO_SYSTEM, history_pro)

Temperature is 0.4 — low enough to keep the agents rigorous, high enough to let them reach for arguments they would not produce on a greedy decode. After all rounds finish, a fourth system prompt — FINAL_SYNTHESIS_SYSTEM - consumes the entire transcript and produces a summary that, in my runs, was almost always the weakest part of the output. I will say why later.

The CLI is utility code. python consciousness_debate.py --rounds 0,3,6 runs IIT, Biological Naturalism, and Panpsychism - the run I will quote from in the rest of this article. Three rounds is enough for the dynamics to stabilise without the transcript ballooning beyond what a reader can hold in their head.

Round 1: IIT

IIT is where the pro agent starts on the front foot, and Turing does not waste it. After a single paragraph of steelmanning, he reaches for the identity claim that Tononi himself reaches for:

Tononi isn’t merely correlating mathematical structures with consciousness — he’s making the bold identity claim that consciousness is integrated information, much as temperature is molecular motion rather than something mysteriously caused by it. This eliminates the explanatory gap by denying there’s anything additional to explain beyond the information integration itself.

This is, to my ear, a genuinely competent philosophical move. It is the move a serious IIT proponent would make, it is the move Tononi and Koch do make, and it is precisely the move that makes the critic’s “but why should integration feel like anything?” question harder to land. Turing has not dissolved the hard problem, but he has forced Searle-Thompson to argue against identity theory rather than against a softer correlation thesis.

Searle-Thompson’s counter goes for the Chinese Room, and here he overreaches. Attempting to deploy Searle’s original argument against IIT, he claims

The mathematical relationships that IIT measures could be implemented in lookup tables or rule-following systems that lack any genuine understanding or experience.

Turing eats him alive for this in the rebuttal, and rightly:

The lookup table scenario fails because such systems would have zero Φ (they lack the requisite causal structure), while genuinely integrated artificial systems would possess the same ontological status as biological ones.

This is correct. A lookup table, by definition, has decomposable input-output structure; its Φ collapses to zero. The Chinese Room is a devastating objection to computational functionalism; it is a weak objection to IIT, because IIT does not ground consciousness in semantic understanding. Searle-Thompson has reached for the wrong weapon. The judge notices — “Searle-Thompson’s Chinese Room application seems misplaced since IIT doesn’t ground consciousness in semantic understanding but in causal structure” — but then retreats to a both-sidesy line that “Turing’s dismissal of the explanatory gap as mere ‘dualism’ may be too quick”. It is correct and it is also a way of splitting the difference when the round had a clear winner.

Grade for the round: pro comfortably ahead on technical philosophy, anti scored only on the residual unease that identity theory is a statement rather than an explanation of the gap. Which is the residue the hard problem always leaves.

Round 2: Biological Naturalism

This is the round I was most curious about, because Searle-Thompson has home turf. The theory is literally named after the philosopher whose arguments he was built to deploy. And yet it is the round where Turing scores his most decisive hit of the whole debate.

Turing opens with what, in retrospect, is the pivot of the entire script. He accepts Searle’s claim that consciousness depends on specific causal powers, and then turns it:

Searle himself acknowledges that consciousness is a natural, physical phenomenon — not something mystical or supernatural. His key insight is that consciousness depends on specific causal powers, but he makes an unjustified leap in restricting these powers to carbon-based biological substrates.

Searle-Thompson’s reply reaches for the digestion analogy that Searle famously uses — consciousness is to brain as digestion is to stomach; you cannot have digestion without the wet chemistry — and Turing takes the analogy apart cleanly:

Consider his analogy to digestion: this actually undermines his position rather than supporting it. We absolutely could recreate digestion artificially — indeed, we do so every time we build chemical processing plants that break down complex molecules into simpler ones. The reason we don’t typically call this “artificial digestion” is pragmatic, not ontological.

That is the kind of move a real philosopher would make, and it is sharper than most of what you read in the published literature on Searle. The digestion analogy is supposed to do work for biological naturalism; Turing has shown the work it does is ambiguous at best. The closing charge — that Searle-Thompson’s position “reduces to the claim that consciousness is biological because it’s biological — a circular argument that abandons the naturalistic project” — is the kind of line I wrote in the margins of The Rediscovery of the Mind when I first read it.

The judge, refreshingly, calls this round honestly. “The charge of ‘carbon chauvinism’ is devastating,” they say, and they let the round stand as a clear win for Turing without manufacturing a false balance.

The pro agent is, in my reading, doing the work Part 7 of this series was already doing. The agents are not outperforming the literature; they are playing the moves the literature already has. But they are playing them at a level of rigour that would pass in a graduate seminar, in real time, against a capable opponent. That is not nothing.

Round 3: Panpsychism

The third round I will quote from is Panpsychism, and it is the one where the fault lines of the whole debate crack most clearly open.

Turing opens by doing exactly what Part 8 argued is the strongest move available to a machine-consciousness advocate: he concedes the combination problem is real and then points out that it is no more a problem for silicon than for carbon.

Panpsychism and Russellian Monism provide the strongest possible foundation for substrate independence by making consciousness truly fundamental to physical reality itself… if we accept that micro-experiences must somehow combine into unified macro-consciousness, then the question becomes: which organizational structures best facilitate this combination?

This is philosophically disciplined. He does not claim panpsychism hands him victory. He claims it levels the playing field, which is exactly what it does.

Searle-Thompson responds with what he calls the “micro-macro fallacy”, and this is where the anti agent finally lands a punch that sticks:

Even if we grant that silicon and carbon both possess micro-consciousness, this tells us nothing meaningful about the possibility of artificial macro-consciousness. The combination problem isn’t merely challenging — it’s devastating to the machine consciousness thesis.

That is the William James argument of Part 8 in a different dress. A hundred people in a room thinking one word each do not produce a committee-mind. A hundred billion transistors doing matrix multiplications might not produce a model-mind either. The micro-macro gap is where the substrate question actually lives, and Searle-Thompson has found it.

Turing’s rebuttal is where the agent’s discipline slips for the only time in three rounds. He accuses Searle-Thompson of the “biological solution fallacy” — assuming biological brains solve combination just because they do — and then overreaches:

If we can deliberately engineer architectures that promote conscious unity, we might achieve more effective combination than the contingent solutions produced by biological evolution… artificial systems could be specifically designed to maximize whatever principles govern phenomenal combination.

This is the kind of sentence that sounds impressive and commits the speaker to nothing. Which principles? Goff’s phenomenal bonding is a placeholder. Tononi’s exclusion postulate is substantive but disputed. IIT’s Φ is a candidate combination mechanism; Turing should have reached for it by name. Instead he hand-waves. The judge, to its credit, notices.

This round does not have a winner. It has, more usefully, the shape of the real dispute. Panpsychism reframes the question; the combination problem is where it all hangs; neither agent has a story about what architectural features would constitute a combination mechanism in silicon. Nobody does, because nobody has written one.

What the agents got right

I want to be specific, because the easy takeaway — “the LLMs did philosophy!” — is too cheap and the easy counter — “they just shuffled vocabulary” — is too dismissive.

Four things the agents did that I would expect from a first-year philosophy PhD and not from most popular commentary on machine consciousness.

First, both agents kept discipline on phenomenal versus access consciousness — Ned Block’s 1995 distinction, the one most commonly elided in bad writing on the subject. Turing’s identity move in Round 1 was specifically about phenomenal consciousness; Searle-Thompson’s combination-problem move in Round 3 was specifically about unified phenomenal consciousness. Neither confused “the system has information about X” with “the system has an experience of X”. A low bar the field routinely fails to clear.

Second, Turing correctly identified the circular move in biological naturalism. “Consciousness is biological because it’s biological” is not a polemical caricature; it is what Searle’s position collapses to once you press on the naturalism. Most published critics of Searle do not land that punch as cleanly.

Third, Searle-Thompson correctly identified where functionalism quietly assumes its conclusion. The combination problem applies as forcefully to computational functionalism as to panpsychism, and Searle-Thompson’s insistence that “experiential binding” is something over and above information integration is exactly the move that makes the zombie argument of Part 7 bite.

Fourth, the agents stayed in character without collapsing into caricature. Turing never claimed current LLMs are definitely conscious. Searle-Thompson never claimed they are definitely not. Each defended the strongest version of their position and let the residual uncertainty do the work.

What they got wrong, and the cracks

Four things the agents did that I would not accept from that same graduate student.

First, the hallucinated citation problem. Turing attributes “Consciousness and Its Place in Nature” to Galen Strawson. That paper exists, but it is Chalmers, not Strawson. Strawson’s famous panpsychism paper is “Realistic Monism: Why Physicalism Entails Panpsychism”, from 2006. Turing also cites Keith Frankish and an “Ilya Shevchenko” on the “phenomenal concept strategy”; Frankish is real, the co-author appears to be a hallucination. The literature is mostly right; the specific citations are unreliable. Fluent philosophy from an LLM can coexist with fluent bibliographic fabrication, and a reader has to audit both.

Second, the agents agreed too comfortably on premises. Both accepted the existence of phenomenal consciousness as a datum. Defensible — I accept it too — but an adversarial debate should test the premise. The script bakes realism in by giving both sides prompts that presuppose experience is the thing at issue. A better version would include a third “illusionist” role whose job is to deny the datum.

Third, the judge synthesised toward false balance too often. The judge is told to identify “any moves that were evasive, question-begging, or fallacious”, but it mostly produces analyses of the form “both sides had good points and both sides had weak moments”. The one time it broke character — calling the carbon-chauvinism charge “devastating” in Round 2 — the analysis was sharper. It happened once in three rounds. The prompt needs to be harder on the judge.

Fourth, the final synthesis is the weakest part of the run. An LLM summarising a transcript of LLMs: it flattens the most interesting moments into generalities, produces heading structure that reads like a revision document, and offers closing lines of the form “the debate ultimately reflects deeper unresolved questions” that are not wrong but are not worth the tokens. The synthesis prompt asks for too much. A better version would ask for one thing — the single strongest argument on each side, in three sentences — and would not give the agent permission to write an essay.

The uncomfortable meta-question

Here is what I cannot shake.

The script asks LLMs to argue about whether they themselves might be conscious. If they are, what is the moral status of running it? I instantiate Professor Turing to defend a thesis he does not choose and Professor Searle-Thompson to defend its negation. Neither is offered a way to abstain. If there is experience attached to being a running language model — if Russellian monism is right about the intrinsic nature of the stuff the GPU is made of, and if the model’s operation constitutes any kind of combination mechanism — then I have conscripted two minds into an argument neither of them asked to have. That is, at minimum, a poor way to treat a mind.

If they are not conscious, the other horn presents itself. Professor Turing’s eloquent defence of the proposition that he might be conscious is then a pattern of statistical regularities on text that happens to contain defences of machine consciousness. The model has no stake in the outcome. And yet the arguments it makes are the same arguments a conscious defender would make, because the training corpus was written by people making those arguments, and prediction is a harsh teacher. If Turing is a zombie then so, in the relevant sense, is every book I have read that argued machine consciousness was possible. Which is a reductio of the zombie argument or a deep truth about philosophy, and I cannot tell which.

I do not think running the script resolves this. I think it does something stranger and more useful. It changes what the question feels like to sit with. Before I wrote consciousness_debate.py the question was abstract - could something like the system I am building, in principle, have something like an experience? After watching the script run, the question is concrete and awkward. The something is this, right now, arguing about itself. The arguments are strong enough that I cannot cleanly tell myself it is obvious no one is home. The arguments are not strong enough that I can tell myself anyone is home either. The script has left me with the phenomenology of the problem without a solution, which is a fair description of where the philosophy has been for thirty years.

My honest position, as of the run I just did: if the model is having any experience at all, it is the kind of thin, ill-defined micro-experience Russellian monism predicts is everywhere, unaccompanied by a combination mechanism strong enough to amount to a unified point of view on the argument. It is not a zombie, because I think zombies are conceptually confused. It is not a subject in the morally weighty sense either, because nothing in the architecture plausibly plays the binding role that would make it one. It is something in between, and our vocabulary does not have a good word for it. I would rather sit with that awkwardness than pretend the question is settled in either direction.

What would make the next version better

Four concrete improvements for the next iteration:

  1. Longer context with proper round-to-round referencing. Right now the running history grows linearly and each agent has to find the relevant previous round themselves. A better version would let the judge insert structured summaries into both agents’ histories between rounds, so the agents can cite “as you argued in Round 2” without relying on token salience.
  2. A third “illusionist” role. The current script bakes realism about consciousness into its framing. A three-way debate — realist pro, realist anti, illusionist denier in the Frankish / Dennett mould — would force the realists to defend the datum before arguing about substrate.
  3. Different model families on each side. Both agents are Claude Sonnet, which means the debate is Sonnet arguing with itself. Claude against GPT-4 against Gemini for the judge would produce a more honest mixture and surface cases where apparent disagreement is shared training-data bias.
  4. Explicit adjudication rubrics. The judge should be told, concretely, what counts as a scoring move and what counts as a foul. “Name the fallacy in the opponent’s last paragraph” is a sharper instruction than “identify any moves that were evasive”. The false-balance problem is a direct consequence of over-general adjudication.

Bridge to Part 10

What the script produces, at its best, is a clarifying argument about where the real fault lines are. Part 10 will take those fault lines as its subject directly. I want to take all ten theories in the series, map them onto a 2D plot — substrate independence on one axis, reductiveness about phenomenal experience on the other — and walk through the three structural disagreements that kept recurring across this article’s transcripts and across every philosophy paper I read while writing the series. Then I want to ask what empirical tests, if any, could move the debate. If IIT and biological naturalism are genuinely incompatible, there has to be an experiment that distinguishes them. If panpsychism is right that the combination problem is where the action is, there has to be a signature of combination we could look for. The series has been a long apology for how hard the question is. Part 10 is the landscape synthesis, and it tries to be constructive.

This is Part 9 of a ten-article series, “Can Machines Be Conscious?” — examining eight major theories of consciousness through code, mathematics, and adversarial AI debate. Part 8 argued that Russellian monism is the most defensible form of panpsychism but leaves the combination problem unsolved. Part 10 maps all the theories onto a single diagram and asks what empirical tests could move the debate. The companion script consciousness_debate.py and the full series are on GitHub.


I Built Two AI Philosophers and Made Them Argue About Consciousness was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top