From Chaos to Control: Bringing Engineering Discipline to AI-Powered Development

How I moved from unstructured Claude prompts to a production-ready orchestration framework — and finally got some sleep.

The Real Problem: Where is the “Engineering” in AI Coding?

We were using Claude Code SDK to generate code. It worked — Claude would write functions, modify files, and make changes. But something kept me up at night:

Where was the software engineering?

Our workflow looked like this:

# The "immature" approach
result = claude_sdk.query("Add pet insurance product to all services")
# Hope it works... 🤞
No structure. No retries. No dependency management. No state tracking. No control. No Visibility on which part is changing

It was just… prompting.

Sure, the AI was smart. But what happened when:

  • A task failed midway? Start over from scratch?
  • Dependencies existed? Hope Claude figures it out?
  • We needed to scale to 7+ services? Send one giant prompt?
  • Something went wrong? Debug a black box?

The answer wasn’t more AI — it was better architecture.

The Sleepless Nights: What’s Missing?

I couldn’t sleep. These questions haunted me:

  • How do we bring engineering discipline to AI coding?
  • How do we scale beyond single prompts?
  • How do we make this reliable, maintainable, and controllable?

I’d lie in bed, running through scenarios:

Scenario 1: Partial Failure

"Add product to 7 services"
✓ Service 1: Success
✓ Service 2: Success
✗ Service 3: Failed

How do I retry just Service 3 without re-running 1 and 2? Answer with prompts: You can’t.

Scenario 2: Dependencies

"Service D needs data from Services A, B, and C"

How do I ensure A, B, C complete before D starts? Answer with prompts: Hope Claude reads it right.

Scenario 3: Monitoring

"Update configuration across services"

Which services are done? Which are pending? What failed? Answer with prompts: You don’t know until it’s all over.

Scenario 4: Scale

"Onboard product to 20 services"

Do I write a 10,000-word prompt? Do I hope for the best? Answer with prompts: Neither scales.

This is what kept me awake. Not “how do I use AI?” but “how do I engineer with AI?”

The Paradigm Shift: AI as a Component, Not the Pilot

The breakthrough occurred when I stopped asking, “How do I make Claude do everything?” and started asking, “What if Claude is just one component in a well-architected system?”

We needed to move away from the “Giant Prompt” model toward a structured Orchestration Framework.

By building a chassis around the AI engine, we let Claude do what it’s brilliant at — understanding requirements and executing tasks — while we do what we’re brilliant at: building reliable, scalable systems.

The Maturity Gap: AI Needs Orchestration, Not Just Prompts

The breakthrough came from a shift in perspective: Stop thinking “How do I make AI do everything?” and start thinking “How do I architect a system where AI is a specialized component?”

Instead of a linear “User → Prompt → Hope” flow, we needed a structured engineering framework:

The “Aha” Moment

The breakthrough came when I stopped asking:

“How do I make Claude do everything?”

And started asking:

“What if Claude is just one component in a well-architected system?”

Instead of forcing AI to handle system concerns, we separate responsibilities:

  • Claude doesn’t manage state
    → We build a StateManager (persistent, queryable, replayable)
  • Claude doesn’t track dependencies
    → We build a DependencyGraph (DAG-based execution planning)
  • Claude doesn’t ensure execution order
    → We build an Orchestrator (deterministic control layer)
  • Claude doesn’t retry failures
    → We implement retry + backoff + idempotency
  • Claude doesn’t monitor progress
    → We build observability (logs, traces, metrics)

Claude does what it's brilliant at : Understanding requirements and executing tasks.

We do what we're brilliant at : Building reliable, scalable systems.

This shift—from "AI does everything" to "AI is a component"—is what made this possible.

The Vision: An Engineering Framework Around AI

I envisioned a system where:

  1. Requirements are structured: Not just prompts, but analyzed and decomposed
  2. Tasks have dependencies: Explicit DAG, not implicit hope
  3. Execution is controlled: Retry logic, timeouts, error handling
  4. State is tracked: Know what happened, when, and why
  5. Context flows: Tasks build on each other’s work
  6. Scale is inherent: Add services by adding tasks, not prompt length

A simple example: “Calculate the sum of 5 and 3”

Old way:

result = claude.query("Calculate 5 + 3")  # ¯\_(ツ)_/¯

New way:

# 1. Planning Agent breaks it down
tasks = planning_agent.plan("Calculate sum of 5 and 3")
# → task-1: Identify operands and operation
# → task-2: Perform calculation (depends: task-1)
# → task-3: Validate result (depends: task-2)# 2. Orchestrator manages execution
workflow = orchestrator.create_workflow(tasks)
orchestrator.execute_workflow(workflow.id)# 3. State manager tracks everything
state = state_manager.load_workflow(workflow.id)
# → All tasks completed, results persisted, dependencies satisfied

This wasn’t just about making AI work — it was about making it engineerable.

The Vision: Let Claude Be the Orchestrator

The concept was elegant:

  1. User provides a requirement: “Calculate the sum of 5 and 3”
  2. A Planning Agent (powered by Claude) analyzes it and creates a task breakdown
  3. A Task Executor (also powered by Claude) executes each task in dependency order
  4. Results flow from one task to the next, building context
  5. Users see real-time progress in a beautiful CLI

Simple math calculation today → Complex multi-service workflows tomorrow.

The Architecture: What Actually Works

┌─────────────────────────────────────────────────────────────────┐
│ User (CLI) │
│ "Calculate sum of 5 and 3" │
└────────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ WorkflowOrchestrator │
│ • Creates workflow from requirement │
│ • Manages task execution lifecycle │
│ • Coordinates all components │
│ • ENFORCES deterministic execution order │
└────────────┬──────────────────────────────────┬─────────────────┘
│ │
▼ ▼
┌────────────────────────────┐ ┌────────────────────────────┐
│ PlanningAgent │ │ TaskExecutor │
│ • Analyzes requirement │ │ • Executes each task │
│ • Creates task breakdown │ │ • Uses dependency context │
└─────────────┬──────────────┘ └─────────────┬──────────────┘
│ │
└────────────┬─────────────────────┘


┌────────────────────────┐
│ ClaudeAgentClient │
│ • Wraps SDK calls │
│ • Loads agent files │
│ • Manages async/sync │
└─────────────┬──────────┘


┌────────────────────────┐
│ claude_agent_sdk │
│ • Agent execution │
│ • Tool integration │
└─────────────┬──────────┘


┌────────────────────────┐
│ Bedrock Proxy │
│ us.anthropic.claude- │
│ opus-4-6-v1 │
└────────────────────────┘

From Chaos to Control: The Architecture Principles

I built this framework around software engineering fundamentals:

1. Separation of Concerns

Don’t let AI do everything. Let it do what it’s good at:

  • Planning Agent: Analyses requirements, creates task breakdown
  • Task Executor: Executes individual tasks with context
  • Orchestrator: Controls workflow lifecycle (not AI)
  • Dependency Graph: Manages execution order (not AI)
  • State Manager: Persists state (not AI)

2. Explicit Dependencies

No more hoping AI always executes tasks in the required Order, it might know that there is a order to follow, but in the world of autonomous developement while executing it might do mistakes in a large context window:

# Before: Hope

prompt = "Do A, then B, then C..."

# After: Engineering

tasks = [
Task(id="task-1", name="A", dependencies=[]),
Task(id="task-2", name="B", dependencies=["task-1"]),
Task(id="task-3", name="C", dependencies=["task-2"])
]
dag = DependencyGraph(tasks) # Enforced order

3. State Management

Track everything, always:

workflow.status = "running"
state_manager.save_workflow(workflow) # Persisted
task.status = "completed"
task.result = result
task.completed_at = datetime.now()
state_manager.save_workflow(workflow) # Recoverable

4. Context Flow

Tasks build on each other’s work:

def _build_context(task, workflow):
context = {}
for dep_id in task.dependencies:
dep_task = get_task(dep_id)
context[dep_id] = dep_task.result # Explicit context
return context

5. Controllability

Every decision is code, not AI:

  • Which tasks are ready? → Dependency graph decides
  • When to retry? → Orchestrator decides
  • What’s the state? → State manager knows
  • How to recover? → Workflow persistence enables it

The result? AI is powerful, but the system is in control.

How we approach to a problem is everything: Before vs After

Let me show you what changed:

Before: The Unstructured Approach

# Single giant prompt approach
prompt = """
Add pet insurance product:
1. Update availability service
2. Configure recommendation engine
3. Add to dist-file-store
4. Update insurance contracts
5. Configure admin UI
6. Run e2e tests
"""
result = claude_sdk.query(prompt)
# What happened? ¯\_(ツ)_/¯
# Did it all work? Who knows
# Something failed? Start over
# Need to retry one step? Can't

Problems:

  • ❌ No visibility into progress
  • ❌ No way to retry individual failures
  • ❌ No dependency enforcement
  • ❌ No state tracking
  • ❌ No scalability
  • ❌ No control

After: The Engineered Approach

# Structured workflow approach
workflow = orchestrator.create_workflow(
"Add pet insurance product"
)
# Planning Agent breaks it down
# → task-1: service-1 (no deps)
# → task-2: service-2 (no deps)
# → task-3: configuration-1 (no deps)
# → task-4: contracts (deps: 1,2,3)
# → task-5: admin-ui (deps: 4)
# → task-6: e2e-tests (deps: all)

# Orchestrator executes with control
orchestrator.execute_workflow(workflow.id)
# Every step is tracked
state = state_manager.load_workflow(workflow.id)
print(f"Status: {workflow.status}")
for task in workflow.tasks:
print(f" {task.id}: {task.status} - {task.result[:50]}")

# Failed? Retry just that task
if task.status == "failed":
orchestrator.retry_task(workflow.id, task.id)

Benefits:

  • ✅ Complete visibility
  • ✅ Granular retry control
  • ✅ Enforced dependencies
  • ✅ Full state persistence
  • ✅ Scales to N services
  • ✅ Total control

This is what kept me up at night — and this is what I built.

The Complete Flow


#The Demo in CLI
source venv/bin/activate && python src/cli.py create "Calculate the sum of 5 and 3"

Requirement: Calculate the sum of 5 and 3

→ Invoking planning-agent...
Agent file: /claude-jobs/.claude/agents/planning-agent.md
Max turns: 5
Using claude_agent_sdk...
Loading agent definition...
Loaded agent: planning-agent
Configuring agent options...
Executing agent query...
.

[DEBUG] Agent response (467 chars):
```json
{
"tasks": [
{
"id": "task_1",
"description": "Identify the operands and operation: operands are 5 and 3, operation is addition (sum)",
"dependencies": []
},
{
"id": "task_2",
"description": "Perform the addition: 5 + 3 = 8",
"dependencies": ["task_1"]
},
{
"id": "task_3",
"description": "Return the final result: The sum of 5 and 3 is 8",
"dependencies": ["task_2"]
}
]
}
```


[DEBUG] Extracted JSON:
{
"tasks": [
{
"id": "task_1",
"description": "Identify the operands and operation: operands are 5 and 3, operation is addition (sum)",
"dependencies": []
},
{
"id": "task_2",
"description": "Perform the addition: 5 + 3 = 8",
"dependencies": ["task_1"]
},
{
"id": "task_3",
"description": "Return the final result: The sum of 5 and 3 is 8",
"dependencies": ["task_2"]
}
]
}
✓ Workflow created: wf_7421bc24

Planned Tasks:
• task-1: Identify the operands and operation
• task-2: Perform the addition (depends on: task-1)
• task-3: Return the final result (depends on: task-2)

Executing workflow...

→ Invoking task-executor-agent...
Agent file: /claude-jobs/.claude/agents/task-executor-agent.md
Max turns: 5
Using claude_agent_sdk...
Loading agent definition...
Loaded agent: task-executor-agent
Configuring agent options...
Executing agent query...
.

→ Invoking task-executor-agent...
Agent file: /claude-jobs/.claude/agents/task-executor-agent.md
Max turns: 5
Using claude_agent_sdk...
Loading agent definition...
Loaded agent: task-executor-agent
Configuring agent options...
Executing agent query...
.

→ Invoking task-executor-agent...
Agent file: /claude-jobs/.claude/agents/task-executor-agent.md
Max turns: 5
Using claude_agent_sdk...
Loading agent definition...
Loaded agent: task-executor-agent
Configuring agent options...
Executing agent query...
.

Task Status - Live Status
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Task ID ┃ Name ┃ Status ┃ Dependencies ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ task-1 │ Identify the operands and │ ✅ completed │ None │
│ │ operation │ │ │
│ task-2 │ Perform the addition │ ✅ completed │ task-1 │
│ task-3 │ Return the final result │ ✅ completed │ task-2 │
└────────────┴────────────────────────────────┴─────────────────┴──────────────────────┘

✓ Workflow completed!

╭─────────────────────────────────────────────────────────────────── task-1: Identify the operands and operation ───────────────────────────────────────────────────────────────────╮
│ ## Task Result: Identify Operands and Operation │
│ │
│ **Operands:** 5 and 3 │
│ │
│ **Operation:** Addition (sum) │
│ │
│ **Expression:** 5 + 3 │
│ │
│ **Result:** 8 │
│ │
│ ### Summary: │
│ - **First operand:** 5 │
│ - **Second operand:** 3 │
│ - **Operation:** Addition (`+`) │
│ - **Computed result:** 5 + 3 = **8** │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────── task-2: Perform the addition ───────────────────────────────────────────────────────────────────────────╮
│ ## Task Result: Perform the Addition │
│ │
│ **5 + 3 = 8** │
│ │
│ ### Calculation Details: │
│ - **First operand:** 5 │
│ - **Second operand:** 3 │
│ - **Operation:** Addition (+) │
│ - **Result:** **8** │
│ │
│ The addition has been performed and confirmed. The sum of 5 and 3 equals 8. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────── task-3: Return the final result ─────────────────────────────────────────────────────────────────────────╮
│ The sum of 5 and 3 is **8**. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Final Thoughts: Engineering Beats Magic

Building this POC taught me something profound about AI-powered development:

The problem isn’t making AI smarter. The problem is making it engineerable.

When I started, I thought the challenge was getting Claude to understand requirements better, or write better code, or be more reliable.

I was wrong.

The challenge was bringing software engineering discipline to AI-powered systems:

What Changed My Thinking

Before: “How do I make AI do everything perfectly?” After: “How do I architect a system where imperfect AI components work reliably?”

The answer:

  • Explicit dependencies instead of implicit ordering
  • State management instead of hope and prayer
  • Controlled execution instead of black boxes
  • Retry mechanisms instead of manual intervention
  • Context flow instead of re-explaining everything
  • Monitoring instead of wondering what happened
  • Determinism through architecture instead of probabilistic chaos

The Real Innovation

It’s not the AI. Claude was always powerful.

The innovation is the wrapper around it:

# The magic isn't here
result = claude.generate_code(prompt)
# The magic is here
class WorkflowOrchestrator:
def execute_workflow(self, workflow_id):
dag = DependencyGraph(tasks) # Structure
while not dag.is_complete(): # Control
ready_tasks = dag.get_ready_tasks() # Smart scheduling
for task in ready_tasks:
context = self._build_context(task) # Context
result = self.executor.execute(task, context) # AI here
task.result = result
self.state_manager.save(workflow) # Tracking
dag.mark_completed(task.id) # Progress

This is software engineering. This scales. This we can build on.

What This Means for AI Development

We’re at an inflection point. AI can write code, but we need to write the systems that use AI to write code.

Those systems need:

  • Architecture (not just prompts)
  • State management (not just memory)
  • Error handling (not just retries)
  • Observability (not just logging)
  • Scalability (not just bigger contexts)
  • Determinism (not just hope)

This is the future: Not AI replacing engineers, but engineers building systems that orchestrate AI.

Can AI Build Complex Systems?

After building this POC: Yes, but only with the right architecture.

The sleepless nights wondering “where’s the engineering?” led to this framework. And this framework proves we can bring discipline to chaos.

The next step? Scale it. Production-ize it. Make it handle real-world complexity.

But the foundation is solid: AI is powerful. Engineering makes it reliable. Architecture makes it deterministic.

The Takeaway

If you’re using AI Autonomous Agent to code outside of your laptop, ask yourself:

  • Can you retry a single failed step?
  • Do you know what’s running right now?
  • Can you handle dependencies reliably?
  • Can you scale beyond one task?
  • Can you recover from failures?
  • Can you predict the execution order?
  • Can you reproduce the same workflow twice?

If the answer is “no,” you don’t have an AI problem — you have an architecture problem.

Build the orchestration. Add the structure. Bring the discipline.

AI is the engine. Engineering is the chassis. Architecture is the foundation.

Without the foundation, even the most powerful engine with the best chassis is still unreliable.


From Chaos to Control: Bringing Engineering Discipline to AI-Powered Development was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top