The Missing Layer in AI Data Pipelines: Why Spec-Driven Development Matters

Let me tell you about a moment that genuinely changed how I think about building data pipelines.

I was demoing Snowflake Cortex Code to a small group of engineers. Showing how you describe a business problem in plain English and get working SQL back in seconds. Five table joins. Correct revenue formula. A full dbt pipeline from scratch. The whole thing. People were impressed — like, genuinely impressed.

And then someone asked a question I wasn’t ready for.

“That’s great. But six months from now, when the engineer who built this leaves — where does the intent live?”

That question stayed with me for some time.

Not because I didn’t have an answer… but because I knew the answer wasn’t good enough.

The honest answer was — it lives in the Snowsight chat history. Which nobody will ever find again.

That question sat with me for weeks. And it led me somewhere I wasn’t expecting. Not to a better prompt. Not to a different model. To something that needs to exist before Cortex ever opens.

The spec.

The Prompt is Doing Too Much Work

Here’s something we don’t say out loud enough about AI-assisted development. The prompt is ephemeral.

You type it, Cortex responds, code gets generated, and the prompt disappears into a chat thread. No version history. No review process. Nothing traceable.

The code survived. The reasoning didn’t.

For a one-off query, fine. Nobody cares. But for production pipelines that real business decisions depend on — this is a problem. Six months later someone asks “why does this model use an INNER JOIN on LINEITEM instead of a LEFT JOIN?” and the answer is lost. Because it was in a sentence you typed into a chat box on a Tuesday afternoon.

What was missing wasn’t a better tool. It was something that lives before the tool. Something that captures intent — not just output. Something a human can review, another engineer can read, and Cortex can act on reliably.

That thing is the spec.

If I try to simplify it… spec is just writing down the thinking before you write the code. That’s it. Nothing fancy. But somehow we keep skipping this step.

What Spec-Driven Development Actually Is

Spec-Driven Development is borrowed from software engineering. The idea is simple — write a detailed specification before any code is written. The spec is the single source of truth. Everything flows from it.

In data engineering with Cortex Code, this takes on a very specific shape. A good spec for a dbt model needs to cover six things:

That last one is important. Acceptance criteria aren’t documentation — they become your dbt tests. not_null. unique. accepted_values. expression_is_true. They are the machine-readable definition of done.

And here’s the part that makes this different from just writing good comments in your SQL. The Cortex prompt is derived from the spec. Not typed fresh from memory each time. Derived. Line by line. Traceable back to a section.

This sounds simple when you say it like this.

But it becomes very clear only when you actually try building something.

Let Me Show You What This Actually Looks Like

We built a three-layer dbt pipeline on top of Snowflake’s TPCH supply chain dataset — 8 tables, 8.6 million rows, no documentation anywhere. Staging, intermediate, mart. All generated by Cortex Code from plain English prompts.

The mart model — mart_revenue_by_segment — calculates net revenue by customer market segment and geographic region. Before we opened Cortex, before a single prompt was typed, we wrote this:

Look at Section 5 in that spec — Edge Cases. COALESCE(L_DISCOUNT, 0). NULLIF(total_orders, 0). INNER JOIN chosen intentionally to exclude orphan orders.

These are not SQL decisions. These are business decisions. And they are documented before anyone touched a keyboard, in a file that anyone on the team can read.

The Cortex prompt at the bottom of that spec is one paragraph. Because all the thinking was already done before we got there. Cortex’s job was execution, not interpretation.

That’s the shift. Cortex is the engine. The spec is the blueprint. One doesn’t work properly without the other.

The Diagram That Made It Click

When I was pulling together the presentation for this demo, I kept trying to explain spec-driven development in words. “It’s a methodology.” “It’s a best practice.” Every time I said it, people nodded politely and moved on. It wasn’t landing.

So I stopped explaining it and drew it instead.

This is when it clicked for me also.

Without this step, Cortex keeps guessing… and we keep fixing.

The spec is what lets you say to Cortex: here is everything you need. Go do the work. I’ll review what you produce.

Sample Spec

What This Changes

If this is done properly… few things change.

You don’t write prompts from scratch every time. You don’t depend on memory. And someone else can actually understand what you built.

That’s a big shift, especially for production pipelines.

Why This Matters More Than It Sounds

I want to be direct about something. This isn’t about process for the sake of process. It’s about what happens to the teams that figure this out versus the ones that don’t.

The teams who adopt spec-first thinking — who write the contract before they open Cortex — are going to compound in a way that’s genuinely hard to catch up with. Every spec they write gets sharper. Every pipeline they build has documented intent from day one. Every model is reviewable by someone who wasn’t in the room when it was created.

The teams still typing ad-hoc prompts into Snowsight chat are going to keep producing impressive outputs with no institutional memory behind them. New person joins — they start from scratch. Something breaks in production — nobody knows why the join was written that way. The original engineer left in Q3.

We’ve been here before. Different tools, same problem. The spec is how you don’t repeat it.

Cortex Code is genuinely powerful. But a powerful engine without a blueprint just runs fast in whatever direction you point it. The spec is how you point it right, consistently, by more than one person, across more than one quarter.

Final Thoughts

Six months later, when someone asks “why is this model built like this”…
the answer shouldn’t be in a chat history somewhere.

It should already be written.

That’s the whole point.

And honestly, this is not just about Cortex or one tool.
Feels like this is where things are heading in general.

The tools are getting better really fast. Writing SQL, building pipelines, generating models… all of that is becoming easier.

But the thinking part — what to build, why it exists, what decisions it should drive that still needs to be done properly.

If you are exploring Snowflake Cortex Code (CoCo) or planning to enable it for your team, happy to connect and exchange thoughts.

I’ve been working closely on Snowflake and modern data platforms, and always open to discuss real-world use cases, challenges, or even different approaches that are working across teams.

Feel free to reach out on LinkedIn : https://www.linkedin.com/in/rahul-sahay-8573923/

CoCo Series Reference In case you have missed: <Click Here>

The Missing Layer in AI Data Pipelines: Why Spec-Driven Development Matters was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.