DeepAgents: The Open-Source Framework for Building Long-Horizon AI Agents

Why LangChain’s DeepAgents matter if you want agents that can actually finish complex work

Most AI agents look impressive for about 30 seconds.

They can answer a question, call a tool, maybe fetch a result, and sound intelligent while doing it. But the moment you ask them to do something genuinely multi-step — research a topic, manage files, delegate subtasks, remember prior context, produce an artefact, and safely operate over time — many of them start to fall apart.

That is the gap DeepAgents is trying to close.

Built by the LangChain team, DeepAgents is an open-source framework designed for long-horizon, artifact-heavy, multi-step agent workflows. It is not just another wrapper around an LLM. It is a more opinionated agent harness built on top of LangChain primitives and the LangGraph runtime, combining planning, filesystem-backed working memory, context compression, subagents, human approval flows, and long-term memory into one coherent system.

In other words:

It is built for agents that need to do real work, not just generate a clever reply.

The core Idea

DeepAgents takes the standard tool-calling agent loop and strengthens it with the things serious agent systems usually need:

Planning
Subagent delegation
Filesystem access
Context isolation
Context compression
Long-term memory
Human-in-the-loop controls
Backend flexibility

Instead of forcing all of an agent’s work to live inside one growing conversation history, DeepAgents treats context as a systems problem. That design choice is what makes it interesting.

A long-running agent should not keep every tool result, every intermediate note, and every verbose response inside the same message thread forever. It should be able to:

offload large outputs into files
summarise older context
delegate deep work to specialised subagents
preserve durable memory separately from scratch work
request human approval for risky actions

That is the DeepAgents philosophy.

Why this matters

Modern agentic workflows are moving beyond “chat with tools.”

The most useful AI systems today increasingly look like this:

Understand a broad task
Break it into smaller steps
Gather evidence
Produce working notes
Delegate parts of the task
Save artefacts
Re-check results
Return something polished

That workflow is difficult to implement with a thin agent loop.

DeepAgents gives you a stronger default architecture for exactly that kind of work.

What DeepAgents actually is

At a practical level, DeepAgents is a Python framework that lets you create a compiled agent graph using a function called:

from deepagents import create_deep_agent
agent = create_deep_agent(
model="openai:gpt-5.4",
tools=[internet_search],
system_prompt="You are an expert researcher."
)

This returns a LangGraph-powered agent with built-in capabilities such as:

write_todos
ls
read_file
write_file
edit_file
glob
grep
task for subagent delegation
optional execution tools in sandbox-enabled environments

So the framework is not merely about “calling an LLM with tools.” It is about giving the agent a workspace, a planning surface, and a control model.

A mental model for DeepAgents

A useful way to think about DeepAgents is this:

LangChain gives you the building blocks. LangGraph gives you the runtime. DeepAgents gives you a batteries-included harness for long-running agent workflows.

That harness includes several built-in middleware layers.

Conceptual flow

This is the big shift: the agent is treated less like a chatbot and more like a small operating system for task execution.

The architecture that makes it powerful

DeepAgents stands out because of how it handles state and context.

In ordinary agent setups, everything tends to pile into the prompt window. In DeepAgents, context can be distributed across:

the active message thread
working files in a virtual filesystem
persistent memory
summarised conversation history
delegated subagent runs

That gives it a very different operational profile.

Why this matters

If a research agent pulls thousands of tokens from search, you do not necessarily want all of that shoved back into the main conversation.

Instead, the agent can:

store large results in files
summarise what matters
pass only a concise output back to the supervisor
keep the main context clean

That is a big architectural win.

Built-in backends: one of DeepAgents’ most important features

One of the strongest ideas in DeepAgents is that storage is a first-class abstraction.

It supports multiple backend styles, including:

1. StateBackend

Best for ephemeral, per-thread working files. Use this when you want the safest default behaviour and do not need host-level file access.

2. FilesystemBackend

Lets the agent interact directly with files on disk. This is powerful, but also much riskier in real deployments.

3. StoreBackend

Designed for durable storage across threads using LangGraph Store. Useful for memory-like persistence.

4. CompositeBackend

Lets you route different paths to different storage backends.For example:

/workspace/ → ephemeral scratch files
/memories/ → persistent stored memory

This routed storage model is one of the cleanest parts of the framework.

Context engineering is the real superpower

A lot of people think the differentiator in agent frameworks is tool count or number of models supported. In practice, one of the biggest differentiators is context management. DeepAgents explicitly treats context as a design surface. The official material highlights several categories:

Input context
Runtime context
Context compression
Context isolation
Long-term memory

This matters because long-horizon agents fail when context becomes bloated, noisy, or contradictory.

DeepAgents addresses this through three major tactics:

1. Offloading

Large inputs or outputs can be moved into files instead of bloating the active prompt.

2. Summarisation

Older conversation history can be compressed when the context window gets too large.

3. Subagent isolation

Verbose intermediate work can stay inside subagent runs, while the main agent receives only the distilled result.

That is exactly the kind of design you want for research agents, coding assistants, analyst workflows, or document-heavy use cases.

Subagents: not just multi-agent hype

Multi-agent systems often get overhyped, but DeepAgents uses subagents in a very practical way. Subagents are useful when:

you want specialised prompts
you want specialised tools
you want different models for different roles
you want to keep detailed intermediate work out of the supervisor context

This makes DeepAgents especially strong for workflows like:

research + synthesis
coding + review
analysis + reporting
extraction + validation

Example mental model

Supervisor: understands the full task and coordinates
Research subagent: gathers detailed evidence
Writer subagent: drafts the final output
Validator subagent: checks correctness

The main benefit is not just “multiple agents.” The benefit is context quarantine.

Minimal example

Here is a simplified example in the DeepAgents style:

import os
from typing import Literal
from tavily import TavilyClient
from deepagents import create_deep_agent

tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

def internet_search(
    query: str,
    max_results: int = 5,
    topic: Literal["general", "news", "finance"] = "general",
    include_raw_content: bool = False,
):
    return tavily.search(
        query,
        max_results=max_results,
        topic=topic,
        include_raw_content=include_raw_content,
    )

agent = create_deep_agent(
    model="openai:gpt-5.4",
    tools=[internet_search],
    system_prompt="""
    You are an expert researcher.
    Conduct deep research and produce a polished report.
    Use internet_search as your primary evidence source.
    """
)

result = agent.invoke(
    {
        "messages": [
            {"role": "user", "content": "What is LangGraph?"}
        ]
    }

)

print(result["messages"][-1].content)

This example looks simple, but the power comes from what DeepAgents adds around it: planning, filesystem tools, summarisation, subagent support, and configurable memory.

Adding a subagent

Here is the same idea with a dedicated research subagent:

research_subagent = {
    "name": "research-agent",
    "description": "Used for in-depth research tasks",
    "system_prompt": "You are a great researcher. Return concise, evidence-rich findings.",
    "tools": [internet_search],
    "model": "openai:gpt-5.2"
}

agent = create_deep_agent(
    model="claude-sonnet-4-6",
    tools=[internet_search],
    subagents=[research_subagent],
    system_prompt="""
    Coordinate the overall task, delegate deep research when helpful,
    and keep the main context clean.
    """
)

This pattern is where DeepAgents starts to feel different from a normal tool-calling agent.

The supervisor does not need to carry every raw search result in its own context. It can delegate, get the result back, and keep moving.

Structured output, memory, and routed storage

DeepAgents is not only about chat workflows. It also works well when you need agents to produce reliable outputs for downstream systems.

Example

from pydantic import BaseModel, Field
from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
from langgraph.store.memory import InMemoryStore

class ReportSummary(BaseModel):
    title: str = Field(description="Short title")
    findings: list[str] = Field(description="Key findings")
    risks: list[str] = Field(description="Key risks")

agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    system_prompt="Analyse documents and return a structured report.",
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": StoreBackend(),
        },
    ),
    store=InMemoryStore(),
    memory=["/memories/AGENTS.md"],
    response_format=ReportSummary,
)

This is a strong pattern for real applications:

ephemeral workspace
durable memory
typed output contract
model-driven reasoning
agent-produced artefacts

That is closer to application architecture than chatbot scripting.

What DeepAgents is especially good at

DeepAgents feels especially well-suited to work that is:

1. Long-horizon – The task cannot be solved in one prompt.

2. Artefact-heavy – The agent needs to write, edit, read, and manage files.

3. Research-oriented – Large evidence gathering and synthesis are involved.

4. Multi-role – Different specialised agents or toolsets are useful.

5. Memory-sensitive – The system needs durable context across interactions.

6. Safety-aware – Certain actions need approval or controlled permissions.

Typical good fits include:

deep research assistants
coding agents
document analysis systems
text-to-SQL assistants
due diligence workflows
analyst copilots
policy or report drafting systems

The safety story: powerful, but not safe by default

This part is important.

DeepAgents is powerful precisely because it gives agents stronger execution surfaces. But that also expands the risk surface. If you allow an agent to:

read files
write files
access host storage
run shell commands
persist memory
call external tools

then you need to think carefully about:

permissions
sandboxing
approval flows
secrets exposure
prompt injection
memory poisoning
persistence boundaries

The practical takeaway

Use safer defaults first.

Prefer:

StateBackend
StoreBackend
controlled permissions
human-in-the-loop approvals
remote or isolated sandbox execution

Be careful with:

direct host filesystem access
unrestricted shell execution
shared mutable memory
trusted memory files without review

This is not a flaw in DeepAgents.
It is simply the cost of moving from toy agents to real operational agents.

Human-in-the-loop is not optional polish

One of the more mature parts of DeepAgents is its support for approval flows. That matters because agent systems become materially more trustworthy when humans can intervene before risky actions happen.

Examples:

approve file writes
approve external sends
approve shell execution
edit tool arguments before execution
reject risky steps

If you plan to use an agent in production for anything beyond passive read-only work, approval hooks should be treated as foundational.

The hidden lesson of DeepAgents

The most interesting thing about DeepAgents is not a single feature. It is the framework’s implicit thesis:

The hard part of building capable agents is not just prompting. It is runtime design.

That includes:

how the agent plans
where it stores work
how it compresses context
how it isolates subtasks
how it remembers
how it asks for permission
how it executes safely

That is a much more realistic view of agent engineering than “just give the LLM tools.”

When you should use DeepAgents

You should seriously consider DeepAgents if:

your agent needs to work over many steps
you need filesystem-backed artefacts
you care about context-window management
you want subagent delegation without hand-rolling it
you need a more production-oriented harness on top of LangGraph
you want built-in planning and memory patterns

You probably do not need DeepAgents if:

your task is simple question answering
one model call plus a tool is enough
you do not need artefacts, memory, or long-running state
you are better served by a small custom LangGraph workflow

DeepAgents is most valuable when the problem is genuinely complex.

Final thoughts

DeepAgents is one of the clearest signals of where agent engineering is going. The industry is gradually learning that useful agents are not just “LLMs with tools.” They are systems that need:

controlled memory
structured delegation
safe execution
context discipline
durable artefacts
runtime orchestration

DeepAgents packages many of those ideas into a practical, open-source framework.

That makes it more than a convenience library. It is a design statement about what serious agent systems require.

If you are building research agents, coding agents, analysis assistants, or other long-horizon workflows, DeepAgents is worth studying closely — not just for what it provides out of the box, but for the architectural assumptions it gets right.

And that may be its biggest strength of all.

DeepAgents: The Open-Source Framework for Building Long-Horizon AI Agents was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why LangChain’s DeepAgents matter if you want agents that can actually finish complex work

The core Idea

Why this matters

What DeepAgents actually is

A mental model for DeepAgents

Conceptual flow

The architecture that makes it powerful

Why this matters

Built-in backends: one of DeepAgents’ most important features

1. StateBackend

2. FilesystemBackend

3. StoreBackend

4. CompositeBackend

Context engineering is the real superpower

1. Offloading

2. Summarisation

3. Subagent isolation

Subagents: not just multi-agent hype

Example mental model

Minimal example

Adding a subagent

Structured output, memory, and routed storage

Example

What DeepAgents is especially good at

The safety story: powerful, but not safe by default

The practical takeaway

Prefer:

Be careful with:

Human-in-the-loop is not optional polish

The hidden lesson of DeepAgents

When you should use DeepAgents

Final thoughts

Leave a Comment