For the First Time, I Watched Codex CLI Enter an Unfamiliar Territory

… and figure it out

Codex CLI cheat sheet

I’ve spent years building, breaking, and wrestling with complex software systems. Like many engineers, I’ve grown accustomed to AI coding assistants that excel at autocomplete, generate boilerplate, or fill in the blanks when you feed them precise instructions. But what I witnessed recently was different. For the first time, I didn’t prompt Codex CLI to “write code.” I placed it inside a carefully designed system of expectations — and then watched it enter an entirely unfamiliar public web ecosystem, crawl it, parse unknown structures, and iteratively figure out how everything worked. No hand-holding step-by-step instructions. No hardcoded parsers. No brittle scripts that would break the moment the site changed its HTML.

It felt like crossing a line most people in software haven’t noticed yet.

Not Just a Model Upgrade

Let’s be clear from the start: this isn’t “Codex got faster” or “better at writing code.” It’s not an incremental improvement in autocomplete or a smarter version of “AI that writes code.” This is something more fundamental.

Traditional AI coding tools operate inside a narrow contract: you give detailed instructions, they generate output. The human remains the architect, the debugger, and the orchestrator. Codex CLI, when embedded in the right environment, behaved like an operator — an autonomous agent that could explore, hypothesize, test, fail, learn, and refine its approach until it stabilized.

I watched it do this in one of the messiest domains imaginable: crawling and parsing unknown public web sources.

Public websites are chaotic. They’re built by different teams using different frameworks, with constantly evolving HTML structures, anti-bot protections, dynamic JavaScript rendering, inconsistent data layouts, and no formal API contracts. Traditional scrapers die quickly here. Even sophisticated ones require constant maintenance.Codex didn’t die. It started exploring.The Inputs That Changed EverythingThe magic didn’t come from better prompting. It came from providing Codex with a system of expectations rather than step-by-step instructions.Here’s what that system looked like:

1. Agent Contract
I defined how the agent was allowed to behave — its permissions, boundaries, and interaction patterns. What tools it could use, what actions were off-limits, and the ethical guardrails for public web interaction (respecting robots.txt, rate limits, etc.).

2. Domain Model
A high-level description of the kind of system it was entering: a public web ecosystem consisting of HTML pages, embedded scripts, APIs, JSON responses, pagination patterns, and data extraction needs. It wasn’t told how any specific site worked — just the general nature of web systems.

3. Task Definition
Clear success criteria: “Extract structured data about [topic] from unknown public sources. Produce clean, consistent output in a defined schema. Handle variations in page structure gracefully.”

4. Mission Constraint
A powerful directive: Don’t stop until it works. Not “try once and give up.” Not “write a script and call it done.” The mission was to achieve reliable data extraction from the target domain, no matter how many iterations it took.

5. No Hardcoded Rules
Crucially, there were no brittle URLs, no pre-written XPath selectors, no parser rules, and no scripted paths. Codex had to discover everything through interaction.

This combination shifted the paradigm from “tell the AI exactly what to do” to “define what success looks like, set the boundaries, and let it discover the path.”

What Happened Next Was the Game Changer

Codex didn’t ask for more instructions. It started inspecting the structure.

Using its available tools (filesystem access, fetch capabilities, Playwright for browser automation, memory for retaining context, and sequential thinking), it began mapping the unknown terrain.

I watched the iterative loop in action:

  • Inspect Structure: Analyze the raw HTML, DOM tree, network requests, and response patterns of the entry points.
  • Navigate Surfaces: Follow links, handle pagination, manage forms, and explore how data flows through the site.
  • Form Hypotheses: “This looks like a list page with cards containing titles, dates, and summaries. The data might be in these specific div classes… or perhaps loaded via this API endpoint.”
  • Test Assumptions: Attempt extraction, run small experiments, validate output against the expected schema.
  • Adjust When Wrong: When something broke (class names changed, data was nested differently, JavaScript rendered content dynamically), it didn’t panic. It refined its approach.
  • Refine Approach: Update its internal understanding and try again.

That loop isn’t execution — it’s iterative discovery under uncertainty.

And it wasn’t a one-shot process. It was continuous refinement until the output stabilized and met the mission constraints.

Adding MCP Servers: From Assistant to Operator

The real capabilities emerged when I connected Codex to MCP servers — providing real tools in real environments:

  • Filesystem: To understand and organize the project structure it was building.
  • Fetch & Playwright: Real web interaction — not simulated, but actual browser and HTTP-level access to live public sources.
  • Sequential Thinking: Breaking down complex crawling problems into logical, manageable steps.
  • Memory: Retaining context across iterations, avoiding repeated mistakes.
  • Context7: Retrieving relevant technical knowledge about web technologies, anti-scraping patterns, and parsing strategies.

With these real capabilities, Codex transformed from an “AI assistant” into an operator — a system that learns the target environment by interacting with it.

At that point, you no longer have passive AI help. You have something that actively discovers how the system works.

A Fundamental Paradigm Shift

Traditional software development follows a linear path:

Input → Process → Output

You write code that assumes a known structure, processes data according to fixed rules, and produces output. When the world changes (website redesigns, API updates), the code breaks and you fix it manually.

What I observed with Codex CLI is radically different:

Observe → Hypothesize → Act → Inspect → Correct → Repeat

It’s a scientific, adaptive process. The system is no longer told how. It’s told:

  • What success looks like
  • What boundaries exist
  • What it is allowed to touch
  • …and then it figures out the rest through iteration.

This is no longer “programming” in the classical sense. It’s designing conditions under which an intelligent operator can discover the solution.

It’s Not Perfect — But It Can Recover

Does Codex get everything right on the first try? No.

But here’s the breakthrough: it now knows how to recover when it’s wrong.

Traditional scripts fail silently or produce garbage when assumptions break. Codex, operating within the system of expectations, detects when output doesn’t match success criteria, traces back where its hypotheses failed, adjusts, and tries again.

That single capability — reliable recovery — changes everything. The system starts to converge on its own.

The Bottleneck Has Shifted

The old bottlenecks in web scraping and data extraction were:

  • Writing the initial code
  • Remembering syntax and APIs
  • Wiring everything together correctly

Those are no longer the primary constraints.

The new bottleneck is defining the right operating model — the agent contract, domain model, task definition, and mission constraints. Everything else? The system starts figuring it out.

We’re no longer just telling machines what to do. We’re designing the conditions where they can figure it out themselves.

This Isn’t AGI Hype

Let me be very clear: this is not another round of AGI hype.

It’s something far more practical — and arguably more disruptive.

We’re not claiming machines have achieved general intelligence. What we have are machines that can enter unfamiliar systems and make sense of them through iteration.

In the domain of crawling and parsing unknown public web sources, this means:

  • Scrapers that adapt to site changes without human intervention
  • Data pipelines that discover new data sources autonomously
  • Extraction systems that handle the long tail of messy, inconsistent websites
  • Reduced maintenance burden as the operator learns and refines its understanding over time

It’s a new class of software: systems that discover rather than being explicitly programmed for every edge case.

If You’re Building Systems Today, Pay Attention

The interface between humans and machines is changing.

We’re moving from “instruction → machine” to “environment → discovery.”

If you’re building complex systems — especially those that interact with the chaotic real world like public web data — you need to start thinking differently:

  1. Design the right system of expectations, not just prompts.
  2. Define success clearly, including recovery mechanisms.
  3. Provide real capabilities (tools, memory, execution environments).
  4. Let the operator discover the path instead of scripting every step.

The future of software engineering isn’t just writing better code. It’s designing the architectures where intelligent operators can explore, learn, and stabilize complex behaviors on their own.

The Future Isn’t Just Written. It’s Discovered.

Watching Codex CLI navigate an unfamiliar web ecosystem — crawling unknown pages, hypothesizing data structures, testing extraction strategies, recovering from failures, and eventually producing reliable structured output — felt like witnessing a new chapter in how we build software.

We just crossed a line most people haven’t noticed yet.

This isn’t about faster code generation. It’s about a fundamental shift in what’s possible when you stop prompting an AI to write code and instead embed it inside a thoughtfully designed system of expectations.

Design the system. Set the expectations. Define success. Let Codex do the discovering.

The age of brittle, manually-maintained scrapers for public web data is coming to an end. In its place, we’re seeing the rise of adaptive operators that can enter unknown territory and figure it out.

And that changes everything.


For the First Time, I Watched Codex CLI Enter an Unfamiliar Territory was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top