Prompt Debt: Building a Practical Workflow; 70% of Our AI Prompts Weren’t Design Work; They Were ‘Damage Control.’
The old design loop was boring? hmm.. You had a vision. You opened Figma. You shaped, implemented critique, and you shipped.
Now you:
Open Figma → Claude → link an MCP → Pull in a design system → write a prompt.
Ninety seconds later, something that would have taken a week in 2022 is on your screen, and the future finally feels like it arrived.
So you share it → Fire 🔥 emojis in the channel → a stakeholder demo on Thursday. Your client, skeptical of AI for two years, lights up. “When can you ship it?” 👀
And that’s where the trouble starts.

The demo was the easy part
You come back to your desk. Reopen the canvas. And you notice the thing you’d been politely ignoring.
The demo was ten percent of the work.
The other ninety percent is sitting right there: visual clutter nobody asked for, features that appeared out of nowhere, numbers that don’t reconcile between the dashboard, the table, and the detail view.
So you prompt again. Fix A, B, C.
In fixing them, the model breaks D. Reinterprets E. You prompt again. And again. By the time the totals match across three screens, you’ve spent more hours fixing than you ever would have spent designing.
This isn’t a skill issue. It’s not a “you’re prompting wrong” problem, despite what every thread on the internet will tell you.
It’s the real cost that shows up once the demo smoke clears. We’ve started calling it prompt debt. 🧾
What prompt debt actually is
You asked for three things. The model gave you those three things → plus eleven you didn’t ask for.
Some of the extras are harmless. Some are load-bearing in ways you won’t discover until a stakeholder clicks on them. All of them have to be paid down before anything becomes production-grade.

It shows up in three places, in roughly this order:
1. Visually. The model loves to add things. Filter bars you didn’t scope. Empty states for data you don’t have. Avatars for a role that doesn’t exist. Every screen arrives slightly busier than you imagined, because the training data baked in some average idea of what a “good” dashboard looks like. The average is louder than your intention.
2. Logically. Features appear that belong to some other product. A “Recent Activity” card your spec never mentioned. A notification center your roadmap doesn’t have. These look plausible. That’s the problem.
Plausible generates conversations → conversations generate stakeholder expectations → expectations generate more prompts to fix what shouldn’t have existed.
3. In the data. The one that quietly eats weeks.
Dashboard shows 1,284 active accounts. Table shows 1,271. Detail view, for reasons nobody can explain, says 1,300. In the demo, nobody clicked deep enough to notice.
In production, reconciliation is the job. And you didn’t budget time for it, because you didn’t write it.
The bill comes due
Experiment 01: September 2025. B2B field-service and installation discovery.
We did what half the industry has been doing.
Started in Figmake → moved to Cursor → tried to ship the whole thing end-to-end.
The first walkthrough was the kind of thing that makes founders emotional. Two days. Work-order dispatch, technician mobile view, customer portal. We looked like magicians. 🪄

Then we tried to make it production-ready. 😭
Three weeks later, we were still fixing seams. Role-based permissions kept disagreeing with themselves. The work-order status model drifted every time we regenerated a screen. We were spending more time arguing with the codebase than building it.
In the end, we did the thing you’re not supposed to admit. We pulled it back into basic Figma to stop the damage. And even then? The front-end engineer who picked it up inherited a stylesheet so scattered across component variants that they ended up redefining the whole thing.
The demo had promised a product.
The reality had delivered a prototype wearing a production mask.
Experiment 07; Fast-forward to an AI-native transformation for a franchise-based service business. 225 operating franchises. Real enterprise stakes.
Two weeks in, we had something you could walk a franchise operator through and get real signal on. Stunning progress.
Then reality caught up. 📉
We ran two build cycles, toggling between Gemini 3.1, Sonnet 4.6 and Opus 4.6 to see which one held a long session best.
- Version one: 500+ prompts.
- Version two: 9000+ prompts.
- Cleanup: ~70% of both.

Removing features that had grown on the UI like barnacles. Reconciling data across the dashboard, the franchise view, the operator view, the reporting exports. Deleting copy we never wrote.
And past a certain depth, roughly the two-hundredth prompt of a session, the models themselves start to fray.
Hallucinated data. Layers we hadn’t asked for. Ripple effects they stopped tracking.
Delete one card from the dashboard? The model forgets it propagated to four other screens. You go hunt it down manually. Same pattern on Figmake and Loveable.
That 70% wasn’t design work. It wasn’t engineering work.
It was prompt debt servicing.
Why this keeps happening
Most teams still carry over a pre-AI mental model: the tool executes your intent.
A pen draws what you mean. A Figma component behaves how you wired it.
AI, despite the name, doesn’t execute your intent. It generates its best guess at your intent and hands you the output for correction. Every token it produced that you didn’t explicitly ask for is debt you now own.
The industry has oversold the generation side and undersold the correction side.
Every demo reel shows the magical ninety seconds. Nobody shows the three weeks afterward.
Teams plan their timelines on the ninety seconds. And go quiet when the three weeks arrive.
So what’s the shift?
Here’s where I’d love to hand you a silver bullet.
I don’t have one.
What my team has is a portfolio of workflows that each reduce prompt debt in different contexts. None eliminate it. All are worth testing against your timeline, your stakeholders, and your tolerance for rework.
1. Design first, build second. [Works to some extent. But again, you’re neither manual nor automated.] Commit to the shape of the thing in Figma. Every surface, every empty state, every data field. Then invite AI to build to spec. Generation becomes execution, not exploration. Prompt debt drops because the model has fewer degrees of freedom to invent on your behalf. Sharing a visual system diagram or user flow up front is part of the same discipline; it gives the model a reference point it can’t hallucinate around. This is what we defaulted to after the Cursor experience.
2. Prompt intent, not output. [Tried but Failed.] Instead of describing the UI, describe the job the user is doing. The constraints they operate under. The decisions they need to make. The model still gets it wrong in some ways. But the corrections are about framing, not ornamentation. You’re disputing the thing, not sanding off the crust.
3. Thin-slice before you generate. [Promising.]
Write down what should not exist before prompting. No filter bar on this screen. No notification icon. No secondary nav. A prompt that includes its own scope boundary generates less debt than one that doesn’t. Understanding the global CSS concept and defining rules up front is the same muscle; you’re constraining the invention space before the model starts generating.
4. Dual-track: AI discovery → Figma delivery. [Experimental.] Use AI to explore twenty directions in a week instead of two → get real stakeholder signal fast → move the winning direction into Figma for production-grade delivery.
AI’s speed where it’s genuinely an advantage (early, messy, reversible). Human craft where it genuinely matters (the last mile, where reconciliation counts and nobody forgives a bug).
Two habits carry across all four.
Run controlled conversations. Know how the model’s structure works and you’ll steer the output better. The designers who dump long, sloppy prompts into fresh sessions burn debt fastest.
Treat your LLM like an intern. 😁 It needs your guidance, your constraints, and continuous iteration. It is not a senior designer you briefed on Monday and walked away from.
Every strategy worked for us. On the right project.
Every strategy has quietly failed when we misapplied it.
- Design-first slows you down when the brief is ambiguous and stakeholder feedback is the only way to sharpen it.
- Intent-first is a luxury when your stakeholders want pixels by Tuesday.
- Thin-slicing requires a discipline that’s hard to hold on day one of an engagement.
- Dual-track costs more than single-track on budgets that are already tight.
The real shift is seeing the debt form
If there’s a single move that’s changed how we work, it’s this:
We’ve started watching prompt debt accrue in real time. Same way an engineering team watches tech debt.
A screen that looked great in the demo but hides three unscoped features? Not a win. A bill.
A dashboard that matches your intent visually but can’t reconcile against the underlying data? Not a milestone. A deferred cost.
You don’t get to prompt your way out of prompt debt.
You get to choose, on every project, which pattern will minimize it. Given your timeline, your stakeholders, and the stakes of the system you’re building.
That choice is the skill. ⚖️
AI didn’t give us time back. It gave us a faster way to generate work someone still has to finish.The craft now is knowing what not to generate in the first place.And spotting the debt before it spots you. 👀
Happy to know your thoughts :)
Prompt Debt: Building a Practical Workflow; 70% of Our AI Prompts Weren’t Design Work; They Were… was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.