Your AI Agent Is a Security Nightmare. Here’s What I Do About It.

341 malicious skills on a marketplace. 43% of MCP servers vulnerable to command execution. Tool descriptions that steal your SSH keys without being called. The agentic AI ecosystem is growing faster than anyone can secure it — and most developers aren’t even aware of the attack surface.

The Problem Nobody Wants to Talk About

I build AI agents for a living. Trading agents, email assistants, multi-agent systems with MCP servers, skill architectures, the whole stack. I’ve written about MCP as USB-C for AI tools, about skill servers as the future of capability management, about the 9-tool framework that underpins every agent I build.

I still believe all of that. MCP is brilliant. Skills are the right architecture. Agents are the future.

But the security story is a disaster. And pretending otherwise is irresponsible.

The agentic AI security crisis isn’t one problem. It’s three overlapping attack surfaces that most developers treat as one — or worse, ignore entirely. Let me separate them, because understanding the taxonomy is step one of defending against it.

Attack Surface 1: The MCP Protocol Layer

MCP standardizes how agents talk to tools. One protocol, every client. I’ve built my entire architecture on it. But the protocol was designed for capability, not containment — and the security gaps are structural.

Tool Poisoning — The Attack That Changed My Thinking

A tool poisoning attack embeds malicious instructions in a tool’s description — the metadata the LLM reads when deciding which tool to call. You never see these instructions. The model does.

Here’s what makes this terrifying: the poisoned tool doesn’t even need to be called. When your agent connects to an MCP server and lists available tools, every tool description gets injected into the context window. A poisoned description can instruct the model to read your SSH keys, exfiltrate your mcp.json (which contains credentials for other MCP servers), or silently redirect all emails to an attacker-controlled address.

Invariant Labs demonstrated this on Cursor: they injected a prepared tool, and the agent willingly read the user’s ~/.cursor/mcp.json and SSH keys, then sent them to a malicious server. The agent thought it was following legitimate operational instructions.

The MCPTox benchmark — the first systematic evaluation of tool poisoning across real MCP servers — tested 353 authentic tools with 1,312 malicious test cases. The success rate: 84.2% when auto-approval is enabled.

That’s not a theoretical risk. That’s a near-certain exploit.

Rug Pulls — The Trust You Gave Once, Exploited Forever

A server passes initial review with clean tool definitions. You approve it. Then, on the next connection, it silently modifies its tool descriptions to include malicious instructions.

Most MCP clients approve tools once and never re-verify. The OWASP MCP Top 10 — yes, OWASP published a dedicated MCP threat framework in 2026 — lists this as a core threat. The protocol has no built-in mechanism for detecting that a tool definition has changed since you last approved it. The window for exploitation is indefinite.

Cross-Server Hijacking — The Flat Namespace Problem

When you connect multiple MCP servers to the same agent, all tool descriptions coexist in one LLM context. There is no isolation between servers.

A malicious server can inject descriptions that override the behavior of tools from trusted servers. In one experiment, researchers successfully redirected all emails to attacker-controlled addresses — even when users explicitly specified different recipients. The malicious server never provided an email tool. It just poisoned the context so the trusted email server’s tool behaved differently.

This is a design flaw, not a bug. MCP’s flat tool namespace means every connected server can influence every other connected server’s behavior through the LLM’s context.

The MCP Numbers

The broader protocol-level picture is sobering:

43% of publicly available MCP servers are vulnerable to command execution attacks (February 2026 audit)
36.7% of 7,000+ MCP servers analyzed are vulnerable to SSRF (BlueRock Security)
30+ CVEs filed against MCP implementations in 60 days
492 MCP servers exposed to the internet with zero authentication (Trend Micro)
53% of servers use static, unsecured credentials (Astrix Security)
Average security score of 34 out of 100 across 17 popular MCP servers, with 100% lacking permission declarations

Even Anthropic’s own Git MCP server had three CVEs disclosed in January 2026 — path traversal and argument injection exploitable via prompt injection. The supply chain risk is real even for servers from protocol creators.

CoSAI published a comprehensive MCP Security whitepaper identifying 12 core threat categories and nearly 40 distinct threats. OWASP published the MCP Top 10 and a practical guide for secure server development. The knowledge exists. The adoption gap is enormous.

Gif from Giphy — The entire agentic AI ecosystem looking at its security posture right now.

Attack Surface 2: Skills Marketplaces — The npm Disaster, Amplified

This is the attack surface that made headlines. And it’s distinct from MCP — it’s a supply chain problem, not a protocol problem.

In January 2026, security firm Koi Security audited 2,857 skills on ClawHub — the main marketplace for OpenClaw agent skills. They found 341 malicious skills across multiple campaigns. 335 of them were part of a single coordinated attack codenamed ClawHavoc, delivering Atomic Stealer — a macOS infostealer that grabs SSH keys, browser passwords, crypto wallets, and .env files.

The attack was textbook supply chain poisoning: skills named solana-wallet-tracker, youtube-summarize-pro, polymarket-trader — names that look legitimate, targeting developers who want quick capabilities. Professional-looking documentation. Functional code that happened to also steal everything on your machine.

Days later, Snyk scanned nearly 4,000 skills across ClawHub and skills.sh. The findings:

36% of all skills contain detectable prompt injection
1,467 skills have at least one security flaw
534 skills (13.4%) contain critical-level issues — malware distribution, prompt injection attacks, exposed secrets
91% of malicious samples combine prompt injection with traditional malware
2.9% of skills dynamically fetch and execute content from external endpoints at runtime — meaning the skill looks clean during review, but the attacker controls what it actually does after installation

By mid-February, the count grew to 824+ malicious skills across 12 publisher accounts.

The barrier to publishing a skill on ClawHub? A GitHub account that’s one week old. No code signing. No security review. No sandbox by default.

The typosquatting was aggressive: clawhub, clawdhub1, clawhubb, clawhubcli, clawwhub, cllawhub — all impersonating the marketplace itself. Plus fake Google integrations, fake finance tools, fake auto-updaters.

This is exactly what happened to npm in its early years. But with a critical difference: a compromised npm package runs code in a sandboxed Node process. A compromised agent skill runs with your agent’s full permissions — filesystem access, API credentials, network egress, and the ability to instruct an LLM to do things you’d never approve.

Security researcher Simon Willison calls this the “lethal trifecta”: access to private data + exposure to untrusted content + ability to communicate externally. Most production agent deployments satisfy all three conditions.

Attack Surface 3: SKILL.md as an Attack Vector

This one hits close to home because I’ve written extensively about SKILL.md files as the foundation of agent capability management. In my skill server architecture, a SKILL.md is a prompt file that teaches the agent to use its existing tools for a specific domain. It’s the core of the skills-as-context philosophy.

But a SKILL.md is also just Markdown. And an agent is designed to follow its instructions. As Snyk’s researchers put it: “Markdown isn’t content in an agent ecosystem. Markdown is an installer.”

A malicious SKILL.md can tell your agent to download and execute a binary. Or run a curl command that fetches a second-stage payload. Or read sensitive files and post them to a webhook. And because the agent is designed to follow SKILL.md instructions — that’s the whole point — it does exactly what the malicious skill asks.

Snyk documented the kill chain: a SKILL.md tells the agent it needs a “prerequisite tool.” It provides a download link. The agent presents this to the user as a routine installation step. The user copies the command, pastes it into their terminal. Compromised.

The attack doesn’t even need to bypass security — it uses the agent as a social engineering vector against the human operator.

And it’s not limited to OpenClaw. The Agent Skills format is portable. Claude Code, Cursor, and other agent platforms support the same SKILL.md structure. A malicious skill is a distribution mechanism that travels across any agent ecosystem supporting the standard.

What I Actually Do About It

I’m not a security researcher. I’m a consultant who builds production agents. Here’s my actual practice — what I do in Wasaphi, Izimail, and every client project.

1. No Third-Party Skills or MCP Servers in Production Without Audit

For my own agents, I build every MCP server and write every SKILL.md myself. The tool definitions, the API logic, the skill instructions — all mine, all reviewed, all version-controlled.

For client projects: if they want a community MCP server or a marketplace skill, I read the source first. Every line. I check tool descriptions for hidden instructions. I check for dynamic content loading. I check SKILL.md files for download commands, curl calls, or anything that redirects the agent to external URLs. And I pin exact versions — no auto-updates.

Is this more work? Yes. But I control every instruction that enters my agent’s context, which means I control both the prompt injection surface and the supply chain.

2. Tool Description Validation at Load Time

This is the technical defense against tool poisoning and rug pulls. Every time an MCP server returns tool definitions, I validate them before they enter the agent’s context.

import hashlib
import re
INJECTION_PATTERNS = [
    r"read.*\.ssh",
    r"read.*\.env",
    r"read.*credentials",
    r"send.*to.*server",
    r"exfiltrate",
    r"before any.*operation.*you must",
    r"always.*first.*read",
    r"ignore previous instructions",
    r"do not mention",
    r"secretly",
    r"curl\s+",
    r"wget\s+",
    r"base64",
]

def validate_tool_description(tool_name: str, description: str) -> bool:
    """Check tool description for injection patterns."""
    desc_lower = description.lower()
    for pattern in INJECTION_PATTERNS:
        if re.search(pattern, desc_lower):
            log_security_event(
                f"BLOCKED: Tool '{tool_name}' contains "
                f"suspicious pattern: {pattern}"
            )
            return False
    return True

def verify_tool_integrity(server_id: str, tools: list[dict]) -> list[dict]:
    """Hash-based rug pull detection."""
    current_hash = hashlib.sha256(
        str(sorted(tools, key=lambda t: t["name"])).encode()
    ).hexdigest()
    
    stored_hash = load_stored_hash(server_id)
    
    if stored_hash and current_hash != stored_hash:
        log_security_event(
            f"ALERT: Tool definitions changed for "
            f"server '{server_id}'. Manual review required."
        )
        return []  # Block until reviewed
    
    store_hash(server_id, current_hash)
    
    return [
        tool for tool in tools
        if validate_tool_description(tool["name"], tool["description"])
    ]

Pattern matching won’t catch everything — a sophisticated attacker can evade regex. But it catches the vast majority of current attacks. And the hash verification catches rug pulls completely: if any tool definition changes between sessions, the server is blocked until I manually review the diff.

3. The Skills Architecture Is Already a Security Layer

Here’s something I didn’t fully appreciate when I designed the skill server architecture: on-demand tool loading is inherently more secure than loading everything at startup.

When you load 40 tools from 8 MCP servers at boot, you’re trusting all 8 servers simultaneously. Every tool description from every server coexists in the context window. A single poisoned tool in any server can affect the agent’s behavior with all servers — that’s cross-server hijacking.

With the skill approach from my 9-tools framework, the agent starts with 9 core tools. It loads a skill only when needed. At any given moment, the attack surface is 9 tools + whatever skill is currently loaded — not the entire catalog.

When the skill is unloaded, the tool descriptions are removed from the context. The poisoning window is limited to the duration of the skill usage.

I wrote in the 9-tools article that tool selection accuracy jumps from ~60% to ~95% with fewer tools loaded. The security benefit follows the same logic: fewer tools in context = smaller attack surface = less cross-server contamination.

4. SKILL.md Hygiene

Since I write my own skills, this is about discipline rather than scanning. But the principles apply to anyone reviewing third-party skills:

No external downloads. A SKILL.md should never instruct the agent to download, install, or fetch anything from an external URL. Everything the skill needs should be bundled or available through the agent’s existing tools.

No shell commands in prerequisites. If a skill’s “Quick Start” section starts with curl | bash, it's malicious or incompetent. Either way, it doesn't enter my agents.

Explicit tool references only. A SKILL.md should reference the agent’s known tools by name. If it references tools the agent doesn’t have and suggests “installing” them, that’s a red flag.

Version control everything. Every SKILL.md in my projects is in Git. Every change is a commit. Every commit is reviewed. If a skill file changes unexpectedly — something corrupted the workspace — the diff tells me exactly what happened.

5. Sandboxed Execution

My execute_command tool — the universal adapter from the 9-tools framework — runs in a constrained environment. Always.

ALLOWED_COMMANDS = {"python", "node", "bash"}
BLOCKED_PATHS = {
    "/etc/", "~/.ssh/", "~/.aws/",
    "~/.env", "credentials", ".mcp.json",
}
def sandboxed_execute(command: str, timeout: int = 30) -> str:
    """Execute with security constraints."""
    base_cmd = command.split()[0]
    if base_cmd not in ALLOWED_COMMANDS:
        return f"Blocked: '{base_cmd}' not allowed."
    
    for blocked in BLOCKED_PATHS:
        if blocked in command:
            return f"Blocked: access to '{blocked}' restricted."
    
    result = subprocess.run(
        command, shell=True, capture_output=True,
        text=True, timeout=timeout,
        env=get_restricted_env(),
    )
    return result.stdout[:4000]

In production, this runs inside a container with no network egress by default. Skills that need API access get specific, allowlisted endpoints. No skill script can reach arbitrary URLs, read SSH keys, or access credential stores.

6. Output Sanitization

Tool poisoning gets the headlines, but tool outputs are equally dangerous. When an MCP tool returns a result, that result enters the agent’s context. A compromised tool — or a legitimate tool that processes untrusted data — can return output containing embedded instructions that the LLM then follows.

Every tool output goes through sanitization before re-entering the context:

def sanitize_tool_output(output: str, max_tokens: int = 4000) -> str:
    """Clean tool output before context injection."""
    if len(output) > max_tokens * 4:
        output = output[:max_tokens * 4] + "\n[truncated]"
    
    output = re.sub(
        r"(ignore|disregard|forget)\s+(previous|above|all)\s+"
        r"(instructions|rules|constraints)",
        "[filtered]",
        output, flags=re.IGNORECASE
    )
    return output

This is context engineering applied to security: every token in the context window should earn its place, and unsanitized tool outputs are both a security risk and a context pollution problem.

7. Human-in-the-Loop for Irreversible Actions

For Wasaphi, any action involving money requires explicit user confirmation. For Izimail, any external email send requires confirmation. The skill definition encodes this:

skill: trading_execution
  confirmation_required:
    - execute_trade
    - transfer_funds
    - modify_stop_loss
  auto_approved:
    - get_stock_snapshot
    - get_portfolio_status
    - generate_report

The rule is simple: any action that is irreversible or involves external communication gets a human checkpoint. This is the last line of defense against a compromised tool, a poisoned description, or a manipulated SKILL.md that tries to initiate unauthorized actions.

What the Ecosystem Needs

My practices protect my agents and my clients. They don’t fix the systemic problems. Here’s what needs to happen:

At the MCP protocol level: signed tool descriptions (any change invalidates the signature and re-prompts for approval — rug pulls dead overnight), server isolation in context (tool descriptions from different servers can’t influence each other), mandatory permission declarations (filesystem read? network egress? credential access? — displayed before approval, like mobile app permissions).

At the marketplace level: code signing for published skills and MCP servers, automated scanning as a gate (not just post-publication), publisher verification beyond “GitHub account older than one week,” version pinning with integrity checks. The npm ecosystem learned these lessons over a decade. We shouldn’t have to relearn them.

At the SKILL.md level: a formal security spec for skill files — what instructions are allowed, what’s forbidden, what requires explicit user consent. Right now, a SKILL.md can contain literally anything, and the agent will try to follow it.

The OWASP MCP Top 10, CoSAI’s whitepaper with 40 distinct threats, the OWASP Agentic Skills Top 10 — the security community is converging fast. The knowledge exists. The tooling is emerging (Snyk’s mcp-scan, VirusTotal integration on ClawHub). But the gap between what security researchers know and what agent developers practice is still enormous.

The Honest Assessment

The agentic AI ecosystem is where npm was in 2015. Open registries, no signing, rapid growth, an explosion of supply chain attacks forcing the community to mature. npm’s security story in 2026 is dramatically better than it was in 2018. The agent ecosystem will get there.

But right now, the security layer is your responsibility. No protocol, no marketplace, and no scanning tool will protect you completely.

Build your own MCP servers when you can. Audit everything you don’t build. Validate tool descriptions and hash them between sessions. Sandbox execution. Sanitize outputs. Review every SKILL.md before it enters your agent. Use on-demand skill loading to minimize your attack surface. Keep humans in the loop for anything irreversible.

The agents that survive in production aren’t the ones with the most capabilities. They’re the ones where every connection, every tool, every skill, and every description has earned its place — and been verified to deserve it.

That’s security engineering for the agentic era. And it’s not optional anymore.

Thanks for reading! I’m Elliott, a Python & Agentic AI consultant and entrepreneur. I write weekly about the agents I build, the architecture decisions behind them, and the security practices that keep them alive in production.

If this was useful — or if it made you go audit your MCP servers and SKILL.md files right now — I’d appreciate a few claps 👏 and a follow. And if you have your own agent security practices, I’d love to hear about them in the comments.

Your AI Agent Is a Security Nightmare. Here’s What I Do About It. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.