Commitment Points in AI-Assisted Development

Three constraint classes that separate the rework from the recoverable.

In any build session, decisions split into two categories. Most are reversible. Button labels, API endpoint names, component structure, even most refactors. A small set embed in places that cost a full rewrite to unwind: the data model, the service topology, where user state lives, what gets captured at event time. We call these commitment points.

Recognizing them is the skill. AI coding assistants make it urgent. When Claude Code produces 300 correct-looking lines in a turn, the friction that used to force pause vanishes. Each commitment used to cost keystrokes; now it costs a prompt. Speed surfaces the cost of missing checkpoints.

We went from instructions.md to a full AWS deployment of a web app, in one extended Claude Code session. Most of it was fast. Some of it was rework. The rework clustered on the decisions we did not pause on.

The concepts described here are packaged as a skill available at https://github.com/venkatperi/commitment-points-skill

Not every mistake compounds

A lot of what went wrong during the session cost us an afternoon and left no architectural trace. We shipped with a wrong Bedrock model ID as a CDK default. We had Lambda security groups with no database ingress because two stacks each created their own. Our .venv directories polluted CDK asset hashes, so source changes were not detected, and the Lambda ran stale code for an hour. CORS headers were missing on Gateway responses. The Python union syntax broke under Python 3.9 until we added from __future__ import annotations.

These hurt at the moment. They did not compound. A sharper linter, better docs, or one more dry-run would have caught most of them. This piece is not about that category.

It is about the decisions that cost a rewrite. Three of them.

Example 1: The 29-second wall

We (Claude) built the question submission flow synchronously. User posts a question, the API Lambda analyzes it, calls Claude Sonnet to generate the response, returns the answer. Clean. Worked perfectly in local dev.

It shipped to staging and started returning 504. The Sonnet interpretation call took 30 to 60 seconds. API Gateway kills any request past 29. The ceiling had been true the entire time we were coding. We did not meet it until production did.

The fix required async end-to-end. POST /questions creates the session in interpreting state, fires the Interpretation Lambda with InvocationType='Event', returns immediately. The Interpretation Lambda writes its result back to Redis on completion. The frontend polls /sessions/current every 2 seconds until the state flips to active, then renders.

That change touched seven files: the API route, the Interpretation Lambda, the Redis client, the Session type, a new polling hook, the QuestionForm progress UI, and the Home redirect logic. Then follow-up questions hit the same 29-second ceiling and we did the refactor a second time. Once async was established as the pattern, adding it to feedback write-backs, PDF generation, and question categorization was mechanical.

What made the first version expensive is the synchronous-looking code. It read correctly, behaved correctly in tests, and carried no visible flag that the production boundary would reject it. Reading AWS docs would have caught it. Reading our own code would not have.

Example 2: State that vanishes on device switch

Three times in the same session we put user state in the wrong place. Terms-and-conditions acceptance went into localStorage. Feedback votes (thumbs up, thumbs down) went into React useState. The active session tier preference started in component memory before migrating to the store.

All three broke the same way. User signs in on laptop, accepts terms, provides feedback on an interpretation. User opens the app on phone the next day. Terms prompt reappears. Feedback buttons look un-voted. Tier defaults to plain. The user complaint arrived quickly: “I gave feedback on one device, the same buttons show on another. Is the state browser local?”

Yes, it was.

The fix in every case looked identical. Add a column to Postgres. Write when the event happens. Return the field from /auth/me or /sessions/current. Remove the client-side persistence. Rewire the components to read from server truth.

Terms took a migration, a model function, an API route, CDK wiring, and a rewrite of the modal to POST to /auth/terms-accept and gate on user.terms_accepted_at. Feedback took a new table, a Redis write-back, an addition to the session payload, and a rewrite of FeedbackButton to accept initialVote as a prop. Each instance took about 45 minutes. The pattern repeated three times because we did not name it after the first.

The underlying question we should have asked for each piece of state: same user, new device, what should they see? If the answer is “their previous state,” then the state is server-owned. There is no MVP version of this that does not become a rewrite.

Example 3: Analytics you cannot recover

We wrote analytics at session finalization. question_analytics rows were inserted inside _finalize_session(), which runs when a user explicitly completes a session or triggers an emergency restart.

The 30-minute Redis TTL expires the majority of sessions. No explicit finalize, no analytics row. The admin dashboard showed session counts near zero. For three days we flew blind on category distribution, verdict rates, genuineness pass rates, and LLM cost.

The fix worked around the missing data by recomputing from credit_transactions, which is written atomically when a credit is deducted. Question creation events could be reconstructed from credit movement because that table was the source of truth for the business event. Category data was unrecoverable because no table captured it at creation time.

The decision point was write_analytics(finalize). It looked fine in isolation. Sessions that ended via finalization wrote rich analytics. Sessions that expired wrote nothing. We assumed finalization was the common path. It was not.

What we should have done: write an analytics row at question creation with whatever fields were known at that moment. Update it on finalize with follow-up count and duration. The discipline is write at event time, enrich later. Events are not recoverable upward.

The pattern: three constraint classes

These three failures sit in different places in the stack but share a structure. Each is a constraint that existed on day one, was knowable from docs or first principles, and went unsurfaced until code encountered it.

We have found three classes worth naming.

Boundary constraints. Limits imposed by the platforms the code runs on. API Gateway’s 29-second ceiling. Lambda cold start behavior. CDK asset hashing semantics. Browser storage scoping. These are named in documentation. They are invisible when the feature discussion focuses on what to build rather than where it will run.

Continuity constraints. What must persist across the boundaries a user crosses. Devices, sessions, refreshes, deploys, time. Every piece of state has an implicit answer to “what happens when the user comes back somewhere else.” That answer determines where the state must live. Getting it wrong is always a rewrite because the wrong storage layer cannot be retrofit without also rewriting every reader.

Capture constraints. What must be written at event time because it cannot be reconstructed later. Analytics. Audit trails. Billing events. If the write is not atomic with the business event, the data is gone. You will be backfilling from proxies, or apologizing for numbers you cannot produce.

These three do not exhaust the category. They are the three we have hit on this build and the three we have hit on prior builds when we went back and looked. Microservices teams tend to fail on boundary first (timeout budgets, retry amplification) and capture second (distributed tracing wired post-incident). Consumer mobile tends to fail on continuity. ML platform work fails on capture (feature lineage) and boundary (GPU memory, batch size).

It depends on the stack which class bites first, and the answer for a given team is usually visible in the last two outage reports.

The practice: three checkpoints

Each class gets one checkpoint. Each checkpoint has a specific moment it fires.

Before choosing a service topology, walk the request path in production. Name every hop. Name every timeout. Identify which hop’s worst-case latency exceeds the ceiling of the hop above it. If the answer is “I don’t know yet,” the checkpoint is not complete and implementation does not start.

Before choosing where any user-facing state lives, answer: same user, new device tomorrow, what should they see? If the answer is “their previous state,” the state is server-owned. Not “we’ll migrate later.” The migration is the rewrite, and the rewrite is exactly what the checkpoint exists to prevent.

Before writing any code that produces a business event, ask: what will I need to know about this in a month when I am debugging or reporting on it? Write that record at event time, in the same transaction as the business action. Enrich later via update.

The checkpoints run separately from feature planning. Mixing them collapses the discussion. You do feature planning, then you do constraint discovery, then you commit to an approach. The discipline is the separation.

Not every code change needs this. Renaming a component, adjusting a query, tweaking a prompt: free. The checkpoints trigger on three specific moves: introducing a new service or a new LLM call, introducing a new piece of user state, introducing a new business event type. When one of those is on the table, run the matching checkpoint before the code gets written.

Why AI-assisted development makes this urgent

When we hand-write code, the act of typing each function provides natural pauses. Each commitment gets thought about because each commitment costs keystrokes. Claude Code collapses that timeline. In our coding session, Claude wrote the synchronous API route, the Interpretation Lambda, the frontend mutation, and the types file in under two minutes. Correct-looking code at every layer. The 29-second ceiling at API Gateway was not visible in any of those files.

Speed did not create this problem. It exposed it faster. The same error would have taken three days to surface in a hand-written codebase. It took forty minutes with Claude Code. What used to be a Thursday afternoon bug became a pre-lunch rewrite.

The response is not to slow the generation down. It is to put the checkpoints at the commitment points and let fast generation happen inside those boundaries.

Prompts that surface each class

These go to Claude Code directly. Paste them in at the commitment point, before any file gets created.

For boundary constraints, before generating service code:

Before we write this, walk the request path in production. For each hop (client, CDN, API Gateway, Lambda, downstream services, LLM calls, DB), name the timeout and the p99 latency of the slowest call at that hop. Which hop dies first at worst-case latency? If the answer is “we don’t know yet,” name the docs I need to read or the measurements I need to take.

For continuity constraints, before generating state code:

For each piece of state this feature introduces, answer: same user, new device tomorrow, what should they see? If the answer is “their previous state,” write the plan for where it lives on the server, how it gets included in the user payload on login, and how the component reads it. If it is fine to lose, name that explicitly.

For capture constraints, before generating event handlers:

List every business event this code produces (user signs up, user submits X, payment succeeds, credit is deducted). For each event, what row gets written in which table in the same transaction? What fields would I need in a month to debug this or report on it? If any event has no atomic write, we stop and fix that first.

We run these as the first message in the planning phase. The output becomes the constraint document. Only then does implementation start.

The meta-skill

Constraint discovery is a practice, not an instinct. What we needed at each of the three failures in this piece was the discipline to pause and ask a specific question. The skill is recognizing when the pause is required.

Three triggers: a new service, a new piece of state, a new event type. Three checkpoints. Three prompts.

Everything else in the session can be generated fast. The commitment points cannot.

Commitment Points in AI-Assisted Development was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.