Claude Code as the operator of a spec-driven pipeline — skills, agents, retrieval, autonomy contract, parallelism, and cross-repo coordination.

Repo: https://github.com/atelier-fashion/adlc-toolkit
The ADLC — Agentic Development Lifecycle — is a spec-driven pipeline that takes a feature request from “someone typed a paragraph in Slack” to “the PR is merged and the service is deployed,” with Claude Code doing the driving. It is not a copilot pattern. It is not code-suggestions-with-human-in-the-loop. It is a protocol that the agent executes end-to-end, pausing only at four declared halt points.
This article describes how it works today. Nine phases, fourteen skills, sixteen specialized agents with model tiering and tool restrictions, a tag-based retriever that pulls prior context across three corpora, an explicit autonomy contract, parallel orchestration across multiple requirements, and coordinated fan-out across multiple repos. If you’re building with Claude Code, the Anthropic Agent SDK, or any multi-agent tool, this is a reference for what a production-grade orchestration layer looks like once you’ve taken it seriously.
The spec-driven contract
Every unit of work is a REQ — a requirement with a spec, a validation gate, an architecture doc, tasks, a review pass, and a wrapup. The REQ is the contract, not the PR. The PR is a byproduct of executing the contract.
The canonical workflow:
/spec → /validate → /architect → /validate → implement → /reflect → /review → merge → /wrapup
Validate appears twice because two different artifacts get gated: the spec (does the requirement express what we actually want?) and the architecture (do the tasks cover the spec, with correct dependencies and repo routing?). Both gates are hard — a failing validation loops back, it does not proceed.
Every artifact lives in .adlc/ under the project root:
.adlc/
config.yml # cross-repo configuration (optional)
context/ # project architecture, conventions, overview, taxonomy
specs/ # requirement docs, architecture docs, task files
knowledge/ # validated assumptions, lessons learned
templates/ # project-local copies of toolkit templates
The toolkit repo holds the process — skills, agents, canonical templates. Each code repo holds the artifacts — specs, architecture, knowledge. This separation is load-bearing: the process gets pulled from one source of truth (a symlinked clone at ~/.claude/skills), while the artifacts accumulate as the permanent memory of each project.
Skills and agents
A skill is a user-invocable /command backed by a markdown protocol file. An agent is a specialized subprocess a skill can dispatch — typed by role, configured with a specific Claude model tier, and constrained to an explicit toolset.
Fourteen skills currently ship:
SkillRole/initBootstrap .adlc/ in a new repo/specWrite requirement specs from feature requests/architectDesign architecture and break a REQ into tasks/validateValidate any phase output before advancing/proceedEnd-to-end pipeline for a single REQ/sprintParallel pipeline orchestrator across multiple REQs/reflectPost-implementation self-review/reviewMulti-agent code review/canaryCanary deployment with smoke tests/wrapupClose a completed feature — merge, deploy, capture knowledge/bugfixStreamlined bug fix workflow/statusCurrent state of all ADLC work/analyzeCodebase health audit/optimizeAPI cost and performance scanner/template-driftDetect drift between project templates and toolkit templates
Sixteen agents back those skills, each defined as a markdown file with YAML frontmatter:
---
name: correctness-reviewer
description: Reviews code changes for logic errors, race conditions,
security vulnerabilities, and edge cases.
model: sonnet
tools: Read, Grep, Glob, Bash
---
The roster groups into five roles:

Why the tiering? Explorers are mostly file-finding; Haiku is fast, cheap, and never sees a code change. Convention-auditor is grep-with-a-checklist; same story. Reviewers, auditors, and scanners reason over actual diffs, so Sonnet. Implementers write code that will ship, so Opus. The cost differential between tiers is substantial; the quality differential only matters at the ends of the spectrum.
Tool restrictions are equally deliberate. Reviewers and auditors get Read, Grep, Glob, Bash (Bash for running tests) but no Write or Edit — a reviewer that can change code is a reviewer with a conflict of interest. Explorers get even less: Read, Grep, Glob only. Implementers get the full toolkit because their job is to modify the tree.
The pipeline in action
When you run /proceed REQ-123, the skill walks the full lifecycle:

The piece worth studying is Phase 5. After implementation, the pipeline needs to decide whether the code is ready to ship. It dispatches six agents in parallel in a single assistant turn:
- reflector — honest self-review against a comprehensive checklist
- correctness-reviewer — logic errors, race conditions, edge cases
- quality-reviewer — convention compliance, naming, duplication
- architecture-reviewer — separation of concerns, contract adherence
- test-auditor — coverage gaps, mock completeness
- security-auditor — input validation, auth, data exposure
All six run concurrently, each reading the same diff, each producing a structured findings list. The consolidation step de-dupes overlapping findings, groups by severity, and routes back to task-implementer for fixes — or forward to PR if clean. This is the one phase that does the heaviest parallel work in the pipeline, because review quality at human-speed would gate the whole thing.
/proceed operates in one of two modes. In main-conversation mode it dispatches agents in parallel as above. In subagent mode — entered when /sprint spawns multiple /proceed instances — it runs phases sequentially and does not dispatch further subagents, because Claude Code forbids nested subagent spawning. The mode switch is automatic based on how /proceed is invoked.
Context retrieval
Every skill that authors a significant artifact reaches backwards first. The hypothesis is simple: if a prior spec, bug, or lesson documented something relevant, the new artifact should cite it rather than re-derive it.
Every spec, bug, and lesson carries five tag dimensions in its frontmatter:
component: payments-api
domain: checkout
stack: [python, postgres, redis]
concerns: [idempotency, rate-limiting]
tags: [stripe, webhooks]
When /spec writes a new requirement, it derives a 5-dimensional query from the feature request, runs three parallel Grep passes (one per corpus), scores each hit with a weighted formula, and merges into a single ranked list:

The weights are deliberately blunt. Component match (+3) dominates because it’s the most specific dimension. Domain (+2) and each concern match (+2) are next. Stack and tags get +1 each because they’re broad — nearly every Python service shares the python stack tag. Lessons get a +1 foundational floor even at zero tag overlap, because a well-written lesson often applies cross-cuttingly and shouldn’t be filtered out aggressively.
Two design choices matter. First, global top-15, no per-corpus quotas — if five prior specs are all clearly relevant, the list should reflect that rather than dilute the ranking for variety’s sake. Second, citations are mandatory. Every new spec must include a Retrieved Context section listing every source ID, corpus, and score. Without citations, retrievals can’t be audited; without the self-tag, the new spec can’t be retrieved by tomorrow’s REQ. The tagged corpus is the product.
Autonomy by design
The default behavior of an LLM is to confirm. A pipeline that confirms at every phase boundary turns a feature ship into a guided tour of permission prompts. The ADLC treats autonomy as a product decision — enumerate the halts, eliminate everything else by design.
The Autonomous Execution Contract sits at the top of /proceed and declares exactly four legitimate halt points:
A. Validation failure after 3 retry loops
B. Reflector surfaces questions requiring user decisions
C. Canary deployment failure
D. Merge conflict that can’t be auto-resolved
Everything else is a log, not a pause. Phase boundaries are announcement points, not confirmation gates.
Three mechanisms enforce this. First, a settings allowlist at .claude/settings.json pre-approves the routine operations the pipeline runs dozens of times: git read/write, gh pr create, gh pr view, npm test, agent-dispatch calls. Destructive operations stay on the ask list: git push --force to main, rm -rf, gh pr merge, terraform apply, terraform destroy. Safety floor preserved; friction eliminated. /init scaffolds this file into every new project; .claude/settings.local.json is gitignored for per-user overrides.
Second, skill wording is deliberate. Phase boundaries emit an “End-of-phase log” — a log is something you write, not a gate you’re about to cross. Ambiguous phrases like “Status Update:” and “Present for review” were systematically removed because they leaked into agent behavior as implicit confirmation prompts.
Third, Phase 5 is a single gate. The six-agent review dispatch emits one assistant message, waits for all six agents to return in parallel, and consolidates. There’s no “here’s what the correctness reviewer said, shall I dispatch the next one?” rhythm — that pattern would turn one gate into six.
The combined result: a clean /proceed run walks all nine phases without a single user-facing prompt unless one of the four declared halts actually fires.
Parallelism
If /proceed takes one REQ to production, /sprint takes several concurrently. “Run REQ-201, REQ-204, and REQ-207 to production” fires three pipelines, each in its own worktree, each isolated from the others’ shell state.
Three mechanics compose:
- Atomic REQ counter. Two concurrent /spec runs would otherwise race on the next REQ number. The counter bump is wrapped in an mkdir-based lock: whichever session succeeds in creating the directory wins; the other retries with the next number. Cheap, atomic, works across processes.
- Worktree isolation. Each /sprint-launched pipeline gets its own .worktrees/REQ-xxx/. Concurrent branches, builds, and gh calls don’t collide because they’re operating on independent checkouts of the same repo.
- pipeline-runner agent. /sprint doesn’t spawn /proceed directly. It dispatches a pipeline-runner subagent per REQ, each running /proceed in subagent mode — sequential phases, no nested subagent spawns. This side-steps Claude Code’s rule that a subagent cannot spawn another subagent, which would otherwise kill Phase 5’s six-agent fan-out the moment /sprint got involved.
The operational payoff: kick off five REQs on a Friday afternoon, come back Monday, find five merged PRs. Each pipeline halts only on the four legitimate halt points. When one pauses for a real reason, the other four keep running — you triage whichever one actually needs attention rather than babysitting all five.
Cross-repo coordination
The hardest feature the ADLC ships is cross-repo: a single requirement that touches multiple repos — say, a backend API, a mobile app, a web app, and infrastructure. The design principle is primary is per-REQ. Whichever repo you invoke /proceed from becomes the primary for that REQ — it holds the spec, the tasks, the pipeline state. A different REQ originating in a sibling makes that sibling the primary. No central coordinator.
Every repo that can originate REQs has its own .adlc/ and its own .adlc/config.yml. Configs are mirror images — each repo marks itself primary: true and lists the others as siblings:
# .adlc/config.yml in admin-api
repos:
admin-api:
primary: true
infrastructure:
path: ../infrastructure
atelier-fashion:
path: ../atelier-fashion
atelier-web:
path: ../atelier-web
merge_order:
- infrastructure
- admin-api
- atelier-fashion
- atelier-web
services:
admin-api:
cloud_run_service: admin-api
region: us-central1
image_path: us-central1-docker.pkg.dev/<project>/admin-api/admin-api
/proceed reads this config and orchestrates the fan-out:

Each skill adapts to cross-repo configuration:
- /architect requires repo: frontmatter on every task it generates; task files must live in the declared repo.
- /validate checks repo-field discipline — present, valid id, path matches.
- /canary reads services: by repo id from config.yml. Single-repo projects use auto-detect fallback.
- /wrapup walks merge_order per-repo for commit, push, merge, cleanup. Ship summary lists every PR.
- /status detects cross-repo mode, scans sibling state files, and surfaces a “Cross-Repo Activity” section listing REQs that originate elsewhere but touch this repo.
- /sprint pre-flights worktree collisions across touched siblings. One sprint still originates all REQs from the invoking repo; cross-repo fan-out inside each REQ is delegated to the underlying /proceed.
- /bugfix supports repo: or touched_repos: on bug frontmatter; cds into the target repo before fixing; cross-repo bugs open one PR per touched repo with a shared branch name.
Single-repo mode is the default and fully backward compatible. Without a config.yml, or with a config listing only one repo, every skill falls back to legacy single-repo behavior. Existing projects are unaffected until they opt in by creating the config.
What it feels like to use
You type /proceed REQ-258. Nine phases execute. Worktrees get created, branches get pushed, six reviewer agents fire in parallel at Phase 5, findings flow back into a fix loop, a canary smoke tests the deploy, /wrapup walks merge_order, and you come back to a merged PR with a consolidated ship summary. At no point during a clean run does the agent stop to ask you anything, because the four halt points are the only things it’s allowed to pause for.
Multiply that by five REQs in /sprint, spread across four repos each, and the outcome is the same — just more of it. Twenty PRs, merged in declared order, with /status giving you a real-time cross-repo activity view the whole time.
The ADLC is not Claude Code being clever. It is Claude Code executing a protocol that the protocol’s authors have made literally executable: enumerate the agents, tier the models, constrain the tools, validate every gate, retrieve before authoring, enumerate the halts, isolate the worktrees, mirror the configs, walk the merge order. Do that, and the agent ships features. Skip any of it, and you end up with a demo.
Inside the ADLC: How an AI-Native Development Lifecycle Works was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.