Deconstructing Agent Skills: A LangGraph Deep Dive

The recent introduction of Agent Skills by Anthropic caught my attention — not because of what it claimed, but because of what it implied.

At first glance, the idea seems almost trivial:

Give LLM a folder of instructions and it becomes more capable.

But that immediately raises a deeper question.

If we’ve always experienced that no single agentic architecture can solve every problem, then how does this approach suddenly enable domain expertise, structured workflows, and reuse across tasks?

There has to be something more going on beneath the surface.

Curious to understand this better, I turned to an open-source implementation — DeepAgents, a package built on top of LangGraph. Unlike closed systems, this gave me a chance to actually trace how “skills” are used in practice

What Are Agent Skills — And Why Do They Matter?

Before diving into the system itself, it’s important to understand what skills actually are!

According to the official definition, Agent Skills are a lightweight, open format for extending AI agent capabilities with specialized knowledge and workflows.

In practice, this is surprisingly simple.

You provide the model with a folder that contains:

a SKILL.md file
and optionally, supporting scripts or documentation

The SKILL.md acts as the brain of the skill—it encodes domain-specific workflows that guide the model step-by-step toward solving a problem.

And this simple abstraction solves three fundamental limitations of vanilla LLMs:

Domain Expertise
You can inject highly specific knowledge and structured workflows into the system.
Repeatable Workflows
The agent doesn’t “wing it” every time — it follows consistent steps for multi-stage problems.
Cross-Agent Reuse
Skills are portable. Share the folder, and another agent instantly gains the same capability.

🔍 Why this works (subtle but important)

What makes this powerful is not just the format — but the idea that:

We are no longer relying purely on model weights for intelligence, but augmenting them with structured, external reasoning artifacts.

And that’s where things start to get interesting…

Progressive Disclosure: The Hidden Trick

To make this concrete, let’s look at what a skill actually looks like in practice.

Here’s a real example from a DeepAgents setup built on top of LangGraph:

---
name: langgraph-docs
description: Use this skill for requests related to LangGraph in order to fetch relevant documentation to provide accurate, up-to-date guidance.
---

At the top, we have a simple YAML block — just a name and a description.

Nothing fancy. No embeddings. No fine-tuning.

And yet, this tiny piece plays a critical role.

Below that, the skill expands into actual instructions:

# langgraph-docs

## Overview

This skill explains how to access LangGraph Python documentation to help answer questions and guide implementation.

## Instructions

### 1. Fetch the Documentation Index

Use the fetch_url tool to read the following URL:
https://docs.langchain.com/llms.txt

This provides a structured list of all available documentation with descriptions.

### 2. Select Relevant Documentation

Based on the question, identify 2-4 most relevant documentation URLs from the index. Prioritize:

- Specific how-to guides for implementation questions
- Core concept pages for understanding questions
- Tutorials for end-to-end examples
- Reference docs for API details

### 3. Fetch Selected Documentation

Use the fetch_url tool to read the selected documentation URLs.

### 4. Provide Accurate Guidance

After reading the documentation, complete the user's request.

🧠 Here’s the interesting part

The agent does not read this entire file upfront.

Instead, it first looks only at the YAML frontmatter:

name
description

Based on that, it decides:

“Is this skill relevant to the user’s query?”

Only if the answer is yes, it proceeds to read the full file.

This pattern is called progressive disclosure.

And it’s a subtle but powerful idea:

You don’t overload the model with all possible knowledge
You let it select first, then expand context only when needed

👉 This is how skills scale without blowing up the context window.

🚧 But this raises a bigger question

So far, what we’ve seen is just:

a folder
a markdown file
some instructions

That alone shouldn’t be enough to:

break down complex problems
execute multi-step workflows
manage memory
or coordinate multiple tasks

Yet… it works.

Which means the real intelligence is not just in the skill —

it’s in the system that uses the skill.

🔍 Peeking Under the Hood: DeepAgents + LangGraph

To understand that system, I looked into DeepAgents, an open-source package built on top of LangGraph.

What it reveals is that skills are just one layer in a much larger architecture.

Under the hood, DeepAgents combines multiple mechanisms:

A todo-list based planner that breaks problems into smaller steps
A tool-driven execution layer that can read files, run code, and fetch data
A general-purpose subagent system that can delegate complex tasks
And a memory layer combined with a summarization tool that keeps context under control

Together, these create something much more interesting than a single agent.

They create a structured reasoning system.

💡 Why this matters

This is the key shift:

We are moving from “LLMs that answer questions”
to “systems that execute workflows”

Skills provide what to do
The system decides when and how to do it

And that distinction is everything.

Now let’s move from concepts to something more concrete.

Here’s a minimal example of how an agent is created using DeepAgents:

from deepagents import create_deep_agent
from langgraph.checkpoint.memory import MemorySaver
from deepagents.backends.filesystem import FilesystemBackend

# Checkpointer is REQUIRED for human-in-the-loop
checkpointer = MemorySaver()

agent = create_deep_agent(
    model="google_genai:gemini-3.1-pro-preview",
    backend=FilesystemBackend(root_dir="/Users/user/{project}"),
    skills=["/Users/user/{project}/skills/"],
    interrupt_on={
        "write_file": True,  # Default: approve, edit, reject
        "read_file": False,  # No interrupts needed
        "edit_file": True    # Default: approve, edit, reject
    },
    checkpointer=checkpointer,  # Required!
)

result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "What is langgraph?",
            }
        ]
    },
    config={"configurable": {"thread_id": "12345"}},
)

At first glance, this looks deceptively simple.

You pass:

a model
a backend
a path to skills

…and you get an “intelligent agent”.

But this is where things gets even more interesting.

Because this function is not just creating an agent —
it is assembling an entire architecture.

🧠 What’s actually happening under the hood?

If you dig into create_deep_agent, you’ll find something like this:

    gp_middleware: list[AgentMiddleware[Any, Any, Any]] = [
        TodoListMiddleware(),
        FilesystemMiddleware(
            backend=backend,
            custom_tool_descriptions=_profile.tool_description_overrides,
        ),
        create_summarization_middleware(model, backend),
        PatchToolCallsMiddleware(),
    ]
    if skills is not None:
        gp_middleware.append(SkillsMiddleware(backend=backend, sources=skills))

    # Add provider-specific middleware, if any
    gp_middleware.extend(_resolve_extra_middleware(_profile))

    # Strip excluded tools after all tool-injecting middleware has run
    if _profile.excluded_tools:
        gp_middleware.append(_ToolExclusionMiddleware(excluded=_profile.excluded_tools))
    # Prompt caching is unconditional: "ignore" silently skips non-Anthropic models
    gp_middleware.append(AnthropicPromptCachingMiddleware(unsupported_model_behavior="ignore"))

    # Permissions must be last so they see all tools from prior middleware
    if permissions:
        gp_middleware.append(_PermissionMiddleware(rules=permissions, backend=backend))

    general_purpose_spec: SubAgent = {  # ty: ignore[missing-typed-dict-key]
        **GENERAL_PURPOSE_SUBAGENT,
        "model": model,
        "tools": _tools or [],
        "middleware": gp_middleware,

This is the real story.

👉 The agent is not a single model call.
👉 It is a pipeline of middleware layers.

Each layer adds a specific capability.

🔍 Let’s break this down (this is the core insight)

1. Planning Layer — TodoListMiddleware

This is one of the most underrated pieces.

It gives the agent a write_todos tool to:

break problems into steps
track progress
move systematically

If you think about it, this is exactly how humans handle complexity.

We don’t solve problems in one shot.
We create a plan.

This middleware forces the agent to do the same.

2. Execution Layer — FilesystemMiddleware

This is where the agent becomes actionable.

It introduces tools like:

read_file
write_file
edit_file
grep, glob
even execute

Now the agent is no longer just “thinking” —

👉 it can interact with its environment

This is also how skills become real:

Skill says: “read this file, run this script”
Agent actually does it using these tools

3. Memory Layer — Summarization

Long conversations break LLMs.

This middleware solves that by:

compressing old context
storing it externally
keeping only what’s relevant

So the agent can scale beyond context limits.

4. Knowledge Layer — SkillsMiddleware

This is where your earlier “skills folder” comes into play.

What this middleware does is subtle but powerful:

It exposes only skill metadata (name + description)
Lets the agent decide relevance
Loads full instructions only when needed

That’s the progressive disclosure we saw earlier.

Notice something important:

👉 It does NOT create tools. Agent can use read_file.
👉 It does NOT execute anything automatically

It simply guides the agent’s decisions

🤯 The missing piece: delegation

After all this, there’s still one more layer that explains how complex problems are handled.

DeepAgents introduces a general-purpose subagent as a tool.

Instead of solving everything in one thread, the agent can:

spawn a new agent
give it a focused task
receive only the final result

Think of it like this:

Instead of thinking harder…
the system splits thinking into multiple isolated workers

This has huge advantages:

cleaner reasoning
reduced context overload
better scalability

Putting It All Together (System Flow)

What This Changes About How We Think About AI

What started as a simple curiosity about Anthropic’s skills leads to a much deeper realization.

Skills alone are not the breakthrough.

They are just one piece of a larger idea.

The real shift is this:

We are no longer building smarter models
We are building better systems around models

When you look closely at implementations like DeepAgents on top of LangGraph, a pattern emerges:

Planning is externalized (todo lists)
Knowledge is modular (skills)
Execution is tool-driven
Memory is managed
Work is delegated (subagents)

Individually, none of these are new.

But together, they create something powerful:

A system that behaves less like a chatbot…
and more like a structured problem solver

This also explains something subtle but important:

Anthropic didn’t just introduce a feature.

They introduced a design philosophy.

Few Skills Repository

Deconstructing Agent Skills: A LangGraph Deep Dive was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.