Every AI Agent Architecture in One Place

Most guides cover 4 or 5 patterns. The real landscape has over 30. Here is the full picture.

Every week, someone asks me whether they need a multi-agent system. About 80% of the time, the answer is no. But knowing when to say yes, and which architecture to reach for, requires seeing the full map. Not just the handful of patterns your framework happens to support, but the entire spectrum from classical cognitive architectures to the latest deep research agents.

The problem is that this map does not exist in one place. Academic surveys are thorough but unreadable. Framework documentation only covers what that framework sells. Blog posts recycle the same five patterns. So I decided to lay it all out, every distinct architectural pattern I have encountered, organised by what actually makes them structurally different from each other.

This is not a list of use cases like “customer support agent” or “code review agent.” Those are applications. This is about the underlying structural blueprints that make an agent behave fundamentally differently depending on which one you pick.

Let us get into it.

The Split Nobody Talks About

Before any pattern selection, you need to understand the deepest fault line in agent design. Every architecture you will ever encounter descends from one of two lineages.

The first is the Symbolic lineage. Think BDI, SOAR, ACT-R. These systems rely on algorithmic planning, persistent state, and explicit internal logic. They dominated before 2022(roughly). They are still the right choice in safety critical domains like healthcare, robotics, and defence, where you need determinism, is auditable, and provable behaviour.

The second is the Neural lineage. Think ReAct, LangGraph, AutoGen, CrewAI. These systems rely on stochastic generation and prompt driven orchestration. Agency is not hardcoded. It emerges from how you wire prompts, tools, and memory together. This lineage took over after LLMs arrived and dominates everything you see in production GenAI today.

Most practitioners only know the neural side. That is fine for 90% of use cases. But if you ever find yourself designing for regulated industries, long-horizon planning, or safety critical decisions, the symbolic lineage has decades of solved problems you should not reinvent (Do share your experiences if you ever had a chance in that department).

The Classical Foundations (Yep! Still Relevant)

I will keep this brief because most readers want the modern patterns. But these four architectures are the intellectual ancestors of everything that follows, and understanding them prevents you from accidentally reinventing the wheel.

BDI (Belief-Desire-Intention) separates what the agent believes about the world, what it wants, and what it has committed to doing. That three way split sounds academic until you realise that every LLM agent that drifts off-task is essentially an agent without an intention mechanism. If you have ever added “stay focused on the original goal” to a system prompt, you were patching the absence of BDI style commitment.

SOAR runs a six step decision cycle and does something remarkable. when it gets stuck, it automatically creates a sub-goal and learns from the resolution. That “impasse creates sub-goal creates learning” loop is exactly what Reflexion reinvented decades later.

ACT-R models memory with activation based retrieval where memories have strength values that decay over time. If you have ever designed a RAG system with recency weighting or relevance scoring, you were implementing ACT-R’s core insight.

Subsumption Architecture (Brooks, 1986) threw out central planning entirely. Instead, it stacked simple reactive behaviour on top of each other, with higher layers overriding lower ones. No planner, no world model. Intelligence emerges from layered reflexes. This is the philosophical ancestor of swarm architectures.

Single-Agent Reasoning (How One Agent Thinks)

This is where most practitioners start, and where the choice of pattern has the most direct impact on cost, latency, and task success rate.

ReAct is the default. The agent interleaves reasoning and action in a tight loop i.e., to think, act, observe the result, think again. It is flexible, intuitive, and the pattern most frameworks give you out of the box. The downside is that it plans one step at a time, which means it can be locally optimal but globally inefficient. It is also token expensive because every step requires a full LLM call with the entire conversation history.

Plan-and-Execute takes the opposite approach. A planner LLM generates the entire multi step plan upfront, and a lighter executor carries out each step without consulting the expensive model again. This is faster and cheaper for structured tasks with clear sequential dependencies. The weakness is rigidity. If step 3 produces an unexpected result, the agent has no natural mechanism to adapt unless you bolt on a replan step.

ReWOO (Reasoning Without Observation) pushes the separation further. The planner writes the complete plan with variable placeholders, workers execute everything in batch, and a solver synthesises the results. No mid-execution reasoning at all. This is the architecture you want for high-throughput batch processing where thousands of documents need the same multi step treatment.

LLMCompiler is the speed demon. Instead of a linear plan, it generates a DAG of tasks, identifies which ones can run in parallel, and executes them concurrently through a task fetching unit. A joiner step then decides whether to answer or replan. The original paper claims a 3.6x speedup over sequential execution. The tradeoff is implementation complexity.

Self-Ask is purpose built for multi hop questions. The agent asks itself “do I need a follow up question?” and recursively decomposes complex queries into simpler sub questions, each answered via search. Simple, effective, and underused.

The Metacognitive Layer (Agents That Improve Themselves)

These patterns add self evaluation and learning on top of the reasoning loop.

Reflexion is the most important pattern in this category. After completing (or failing) a task, the agent generates a verbal self critique, identifies what went wrong, and stores that reflection in episodic memory. On the next attempt, it reads its own past critiques and adjusts. This is reinforcement learning through natural language, with no model retraining required. If your agent keeps making the same mistakes across runs, Reflexion is likely what you need.

Evaluator Optimiser is the production workhorse. One LLM generates output. A second LLM evaluates it against quality criteria and provides structured feedback. The generator revises. This loop continues until the evaluator approves. Anthropic highlights this as one of the most effective patterns they see in real deployments, particularly for code generation where the evaluator can run tests.

Search-Based Reasoning (When Linear Thinking Fails)

Sometimes the problem space is too complex for a single chain of thought. These architectures explore multiple paths.

Tree of Thoughts generates several candidate reasoning paths at each step, evaluates them, prunes the weak ones, and expands the promising ones. Think of it as giving the agent the ability to brainstorm and backtrack rather than committing to its first idea. It shines on creative problems, puzzles, and any task where the right approach is not obvious upfront.

Graph of Thoughts extends this further by allowing arbitrary connections between thoughts. Ideas from different branches can be aggregated, merged, and refined in loops. The strict tree hierarchy disappears, replaced by a flexible graph where the best parts of multiple reasoning paths combine into something better than any individual branch.

LATS (Language Agent Tree Search) is the heavyweight. It unifies ReAct, Reflexion, and Monte Carlo Tree Search into a single framework. The LLM simultaneously acts as the agent, the value function (scoring states), and the optimiser (selecting the best trajectory). It achieves state-of-the-art results on coding and QA benchmarks. The cost is real, LATS averages around 71 LLM calls per request compared to roughly 9 for standard ReAct. Use it when quality matters more than speed or cost.

Orchestration Patterns (Wiring LLM Calls Together)

These come primarily from Anthropic’s influential “Building Effective Agents” guide and represent the structural patterns for composing multiple LLM calls into workflows.

Prompt Chaining is the simplest, a linear pipeline where each LLM call processes the output of the previous one, with optional validation gates between steps. Generate, then translate. Extract, then classify, then format.

Routing places an LLM at the front as an intelligent dispatcher. It classifies the input and directs it to the appropriate specialised handler. Easy queries go to a smaller, cheaper model. Complex ones go to a more capable one. Customer support triage is the canonical example.

Parallelisation fans out work. Either you split a task into independent subtasks and process them simultaneously (sectioning), or you run the same task multiple times and aggregate via voting for higher reliability. Document processing across pages is the classic use case.

Orchestrator-Workers is the dynamic version of Plan-and-Execute at the multi call level. A central orchestrator LLM decomposes the task, spawns worker LLMs, collects their results, and adapts its plan based on what comes back. Unlike prompt chaining, the orchestrator does not follow a fixed path. It maintains a living TODO list.

Multi-Agent Topologies (How Agents Organise)

This is where the architecture becomes about organisational structure rather than individual cognition.

Supervisor (Hierarchical) is the most common production pattern. A central supervisor agent receives the request, decomposes it, delegates to specialised sub-agents, validates their outputs, and synthesises a unified response. It mirrors a traditional org chart. The benefit is auditable and traceability. The risk is that the supervisor becomes a bottleneck.

Handoff (Peer-to-Peer) eliminates the central controller. Agents transfer control to each other directly via tool calls. The billing agent decides the refund agent should take over and passes context along. This works well for conversational systems but can become chaotic without clear handoff protocols.

Subagents (Agents-as-Tools) looks similar to Supervisor but has a critical difference. Subagents operate in complete context isolation. They receive only their specific input, do their work, and return results to the main agent. They never see each other’s context. This is the pattern you want when context contamination between specialists is a concern.

Skills (Dynamic Context Loading) is the simplest form of specialisation. A single agent loads different prompts, tools, and knowledge on demand based on the task type. It is not truly multi-agent. It is one agent that shape shifts. Often this is all you actually need before reaching for the complexity of real multi-agent coordination.

Swarm goes fully decentralised. Many simple agents communicate only with their neighbours through local signals. There is no central controller and no global state. Intelligent behaviour emerges from repeated local interactions. This is robust and scalable but extremely difficult to debug or guarantee specific outcomes.

Debate (Adversarial) pits agents against each other. Multiple agents argue opposing positions and a judge synthesises the strongest answer. Structured disagreement as a feature, not a bug. This reduces hallucination through adversarial pressure and works well for high-stakes decisions.

Crew (Role-Based) assigns agents explicit personas with defined responsibilities i.e, researcher, writer, editor, QA reviewer. They collaborate via message-passing on shared tasks, mimicking human team dynamics. CrewAI and MetaGPT popularised this pattern.

Adaptive Agent Network lets agents self organise. No fixed hierarchy. Agents dynamically assume roles and delegate based on the task at hand. The topology reshapes itself per request. This is the most flexible but also the hardest to reason about.

Deep Research Agents (The Emerging Frontier)

This category has exploded over the past six months and represents a distinct architectural class built for long horizon, knowledge intensive synthesis.

Static Pipeline deep research agents follow fixed stages including plan, search, read, synthesise, write. Each stage is a predefined module. Simple to build but inflexible when the research question requires adaptive exploration.

Dynamic Deep Orchestrators are where things get interesting. The planner generates tasks with explicit dependency graphs. A FIFO queue manages execution order. External memory banks prevent context window overflow by extracting and storing knowledge outside the prompt. A verification step catches hallucinated plans before they execute. The synthesiser pulls from the memory bank to write the final output. This is the architecture behind systems like Gemini Deep Research, OpenAI Deep Research, and the open source mcp agent project.

Hierarchical Multi-Agent Deep Research nests the patterns. A top-level planner coordinates multiple parallel research pipelines, each with its own search, validation, and extraction agents. Results converge through a synthesis orchestrator into a final pattern synthesiser that identifies cross-cutting themes and writes evidence backed reports. Google’s ADK implementation is a good reference for this pattern.

Final Thought

The agent architecture landscape is maturing fast. Six months ago, deep research agents barely existed as a category. A year ago, LATS and LLMCompiler were academic curiosities. Today they are appearing in production systems.

What has not changed is the fundamental principle, architecture selection is the highest leverage decision you make when building an agent system. Get it right and everything downstream becomes easier. Get it wrong and no amount of prompt engineering will save you.

I hope this map helps you make that decision with more confidence.

Every AI Agent Architecture in One Place was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.