Most teams using Claude Code are doing it wrong.
They treat the AI like a single, brilliant intern — toss it a task, review the output, fix the mess, repeat. It works, sort of. But it’s like hiring a concert pianist and asking them to only play chopsticks. The real power isn’t in having one agent do everything. It’s in making the agent switch roles at precisely the right moment in your development lifecycle.
Garry Tan — Y Combinator’s CEO, former early engineer at Palantir — recently open-sourced his Claude Code setup and shared the numbers: 10,000 lines of code and 100 pull requests per week over a 50-day stretch. Andrej Karpathy told the No Priors podcast in March 2026 that he hasn’t typed a line of code since December. Peter Steinberger built OpenClaw — 247K GitHub stars — essentially solo with AI agents.
These aren’t people who got lucky with prompts. They built process around their agents. And the process is what separates “useful toy” from “shipping machine.”

The Core Tension Nobody Talks About
Agents are fast but unsupervised. Humans are slow but have judgment. Every decision in your pipeline architecture comes down to routing the right decisions to the right actor.
The mistake most teams make: they bolt Claude Code onto their existing workflow and call it a day. Agent writes code, human reviews the PR, done. This misses the entire point. The review phase is too late to catch the real problems — architectural misjudgments, wrong decomposition, misunderstood requirements. By the time you’re reading a 400-line diff, the damage is done.
The fix is upstream. Move human judgment earlier. Move agent autonomy later.
The Five-Phase Pipeline
Each phase has a clear owner — agent or human — and the handoff points are deliberate.
Phase 1: Planning (Human + Agent, collaborative)
A human writes a brief. It can be a one-liner: “add rate limiting to the /api/upload endpoint.” Then the agent produces a plan artifact — which files change, what the approach is, edge cases it foresees, estimated blast radius.
The human reviews the plan, not the code. This takes two minutes instead of twenty. And it catches the architectural mistakes that no amount of code review would fix. The plan gets committed as a markdown file alongside the PR later. Six months from now, when someone asks “why did we build it this way?” the answer is right there in git history.
Phase 2: Implementation (Agent-led, sandboxed)
The agent works on a feature branch. Never main. This is non-negotiable.
It runs in a loop: implement, run tests, fix failures, run linter, fix, repeat. The agent should self-heal before a human ever sees the output. If your CLAUDE.md says "every public function gets a test," the agent writes the tests. If it says "never use raw SQL — use the query builder," the agent follows that rule every single time, at 3am, on its hundredth PR, without getting sloppy.
One critical constraint: scope lock. The agent’s task is defined by the plan. If it discovers something else that needs fixing along the way, it logs it as a separate issue. It does not scope-creep the current PR. This is the single biggest source of quality problems with agent-written code, and almost nobody enforces it.
Phase 3: Automated Quality Gates (No humans, just CI)
Before any human sees the code, these pass automatically: tests (unit + integration), lint, type checking, diff size check (PRs over 400 lines get flagged — large agent PRs are a smell that the task wasn’t decomposed well), dependency audit (did the agent add something new?), and security scanning.
None of this requires a human. It’s table stakes.
Phase 4: Code Review (Human-led, agent-assisted)
This is the critical gate. But you can make it dramatically faster.
The agent generates the PR description explaining what changed, why, and what the reviewer should pay attention to. Then a second Claude Code instance reviews the first one’s work. This sounds redundant, but it catches a surprising amount — the reviewer agent has fresh context and no attachment to the implementation decisions.
The human reviewer then sees: the original plan, the agent’s PR description, the second agent’s review, and all CI green. Their job narrows to: “Does this match the intent? Are there architectural concerns the agents can’t see? Any business logic subtleties?”
One rule worth enforcing: the person who wrote the brief should not be the reviewer. Same principle as traditional code review. Fresh eyes catch what familiar eyes miss.
Phase 5: Merge and Deploy
Squash merge to main. Agents generate noisy commit histories, and squashing keeps things clean. Deploy to staging. Smoke tests run. Human promotes to production.

The CLAUDE.md File Is Your Most Important Investment
A CTO friend of Garry Tan’s texted him shortly after trying his setup: “Your gstack is crazy. This is like god mode. Your eng review discovered a subtle cross-site scripting attack that I don’t even think my team is aware of.”
That didn’t happen because of a clever prompt. It happened because of a well-structured CLAUDE.md — the file Claude Code reads at the start of every session.
Your CLAUDE.md should contain: the project's architecture decisions and why they were made, naming conventions, forbidden patterns ("never use any in TypeScript," "never raw SQL"), test patterns, file organization rules, and the specific commands to run for building, testing, and linting.
Every agent session reads this first. It replaces a senior dev looking over the agent’s shoulder. And unlike a senior dev, it never gets tired of repeating itself.
For teams, you actually want a hierarchy: a global ~/.claude/CLAUDE.md for personal preferences, a project-level CLAUDE.md for repo-specific rules, and potentially org-level settings that apply everywhere. The project-level file is the one that matters most.
A software engineer named Chamith Madusanka wrote about integrating Claude Code into his team’s enterprise Go + React + TypeScript + Terraform codebase. What used to take 30–60 minutes of a senior engineer’s time — reviewing, debugging CI failures, onboarding context — now gets an initial pass in under five minutes. But the key detail: Claude needs the right environment in CI. Their workflow installs all build tools, linters, and test runners so the agent can actually compile, test, and lint the project — not just read files. Without that, you have an agent that gives opinions instead of an agent that verifies its own work.
You Don’t Have to Build This From Scratch
The ecosystem has matured fast. The tools you need already exist.
GStack — Garry Tan’s open-source skill pack — turns Claude Code into a structured virtual engineering team. Twenty-three slash commands, each activating a distinct cognitive mode. /plan-eng-review locks architecture and edge cases. /review does staff-engineer-level code review. /qa opens a real browser with Playwright and tests your app. /ship bootstraps test frameworks, runs coverage, creates the PR. /freeze restricts edits to specific directories so the agent can't touch production code while debugging.
The insight behind GStack is the one that keeps surfacing across every team doing this well: different phases of development require fundamentally different cognitive modes. When you ask the same model to plan, implement, review, and ship in the same conversation, you get a mediocre blend of all four. GStack gives you explicit gears. One team building 100+ skill files over 108 hours of unattended operation found the same thing — separating “plan review” from “code review” was the single biggest quality win. When Claude tries to do both in one pass, it either rubber-stamps or gets lost in details.
Claude Code GitHub Actions — Anthropic’s official integration — runs the full Claude Code runtime inside a GitHub Actions runner. Mention @claude in any PR or issue and it responds. Or configure it headless: it fires on every PR open, CI failure, or issue creation. API cost for 50 PRs a month runs under five dollars.
OpenHands — for teams that want fully autonomous workflows. Assign a GitHub issue and an agent picks it up, codes it, tests it, and opens a PR with zero human involvement in between. It runs in sandboxed Docker or Kubernetes environments and supports any model. As one developer put it: “I haven’t tried Devin yet but I love OpenHands. Just opening GitHub issues and the AI figures it out and writes tests and then pushes a PR is magical.”
Superpowers — a TDD-first pipeline with 106K GitHub stars that some teams pair with GStack. GStack handles planning and QA; Superpowers handles the implementation loop. They complement each other well.
Task Decomposition Is the Skill That Matters Now
The number one failure mode isn’t bad tools. It’s giving an agent a task that’s too big.
“Build the authentication system” will produce garbage. “Add password hashing to the signup endpoint using bcrypt, update the user schema, write tests” will produce good code. The difference is decomposition.
This means the most valuable skill for a developer working with agents isn’t writing code anymore. It’s breaking problems into agent-sized pieces — one to three files changed, clear inputs and outputs, well-defined done criteria. The person who can decompose a week-long feature into fifteen clean agent tasks will ship faster than someone manually coding eight hours a day.
Software engineering skill hasn’t become less important. Which skills matter has shifted.
The Mental Model
Stop thinking of your AI coding agent as a tool you use. Start thinking of it as a team you manage.
That shift changes everything. You wouldn’t let a junior dev merge to main without review. You wouldn’t let a senior architect write code without a plan. You wouldn’t skip CI because someone seems really confident. The same discipline applies — the agent is just faster, cheaper, and never takes PTO.
The teams shipping the most reliably with agentic coding aren’t the ones with the best prompts. They’re the ones with the best process. CLAUDE.md files that encode real engineering standards. CI pipelines that catch problems before humans have to. Task decomposition that gives agents work they can actually succeed at. And human review focused on judgment, not line-by-line inspection.
The code writes itself now. The architecture of how it gets written — that’s still on us.
Your AI Coding Agent Isn’t a Team Member. It’s Five of Them. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.