Human In The Context — Why AI Systems Stall at Scale

How Context-Assembled Execution (CAE) and Distributed Context Assembly Platform (DCAP) redefine how behavior is controlled in AI systems

Authored by David Coleman & Jose Antonio Marquez

This paper describes a general system model for AI execution at scale. While informed by real-world experience, it is not specific to any single organization or implementation.

The Problem: Inconsistency at Scale

Organizations are rapidly moving from small-scale AI experimentation to broad, enterprise-wide adoption. What works for a handful of individuals — prompting, ad hoc workflows, locally defined tools — begins to break down as usage scales across hundreds or thousands of people.

At that scale, the problem is no longer access to capability. It is consistency of execution.

To address this, we introduce a model we refer to as Context-Assembled Execution (CAE).

In this model, behavior is not embedded in tools or reconstructed through prompting. Instead, it is assembled dynamically at the moment of execution from structured layers of context.

This paper describes CAE as a system pattern, and introduces DCAP (Dynamic Context Assembly Platform) as a reference architecture for implementing it.

Together, they define a shift in how AI systems operate at scale:

CAE defines the execution model
DCAP provides the system that makes it real

In large organizations, we’ve seen teams independently build overlapping solutions to the same problems. Similar tasks produce different outcomes depending on who invokes them. Organizational standards are applied unevenly, if at all. Context is copied, repeated, and lost. Costs grow unpredictably as systems rely on increasingly large and redundant inputs. What begins as acceleration at the individual level does not stabilize as it scales. It compounds.

Each new team introduces its own variations. Each variation becomes another branch of behavior to maintain. Context grows, but not coherently — copied across prompts, duplicated across tools, and redefined in slightly different ways. Over time, the system does not converge toward consistency. It diverges.

The Problem: Divergence at Scale — when teams build in isolation, similar work produces different outcomes, context is duplicated, standards drift and costs grow unpredictably

At scale, this creates a failure mode that is already beginning to surface in large organizations. Organizations lose the ability to reason about how work is being performed. Outputs become unpredictable. Standards become advisory rather than enforced. Trust in automated systems erodes — not because they are incapable, but because they are inconsistent.

In many organizations, this transition is subtle. It does not appear as a single failure, but as a growing loss of alignment that becomes harder to correct over time. This is the point at which adoption stalls. This is the point when costs go exponential. This is when the system fails — not due to lack of capability, but due to lack of control.

Why Current Approaches Don’t Readily Scale

This is not a limitation of the underlying models. It is a limitation of the systems built around them.

Most current approaches treat capabilities — skills, tools, agents — as the primary unit of design. Behavior is embedded directly into these capabilities, or reconstructed through repeated prompting at the point of use. Both approaches fail to scale. Embedding behavior leads to duplication and drift as teams fork and customize implementations. Prompting relies on human consistency that does not exist at enterprise scale.

What is missing is a consistent way to govern how work is executed without constraining who performs it.

This becomes immediately visible in even the simplest workflows. A task like creating a Jira ticket appears uniform, but in practice varies widely across teams — each with different fields, standards, and expectations. Without a consistent way to apply that context at execution time, behavior fragments, duplication increases, and outcomes diverge. The diagram below illustrates how this variation emerges — and why it compounds as systems scale.

A Different Approach: Context as the Source of Behavior

Most systems attempt to solve this problem by enforcing behavior directly.

They embed rules into tools, define rigid workflows, or rely on repeated prompting to shape outcomes at the moment of use. At small scale, this works. At enterprise scale, it breaks. Rules become brittle, tools diverge, and prompting becomes inconsistent. The system fragments because behavior has no stable place to live.

The approach described here is based on a different premise: that governance should not be enforced mechanically, but interpreted dynamically.

Instead of encoding behavior into capabilities, this model introduces a context assembly layer that defines how work should be performed at the moment it is executed. Organizational standards, team conventions, and individual expertise are not embedded in the tool — they are assembled, layered, and presented to the model as structured context.

There is no rigid rules engine enforcing behavior through fixed pass/fail logic. The model itself becomes the governing mechanism, reasoning over structured context the same way an experienced practitioner would — evaluating constraints, resolving conflicts, and applying judgment.

This shifts the role of the system entirely. Instead of enforcing behavior, it shapes it.

Capabilities remain simple. Context becomes authoritative. And governance moves from something the system executes to something the model understands.

Context as a source for a scalable Jira capability

Context-Assembled Execution (CAE)

Context-Assembled Execution (CAE) is a model for AI systems in which behavior is not embedded in capabilities but dynamically constructed at execution time from layered context.

Instead of encoding rules into tools or relying on repeated prompting, CAE externalizes behavior into structured context layers — such as company policy, organizational standards, and individual expertise — which are assembled and interpreted by the model at the moment work is performed.

This model mirrors the CSS cascade: context is composed top-down from layers of authority, while resolution occurs bottom-up, where more specific context overrides more general guidance.

In CAE:

Capabilities remain simple and reusable
Context defines behavior
Governance is applied dynamically, not enforced statically

This enables consistent execution at scale without sacrificing flexibility, allowing systems to converge over time rather than fragment.

Dynamic Context Assembly Platform (DCAP)

DCAP (Dynamic Context Assembly Platform) is a reference architecture for implementing Context-Assembled Execution.

It provides a runtime system that:

Collects context from layered sources of authority
Assembles that context dynamically at execution time
Injects it into model-driven capabilities
Produces governed outcomes based on interpreted context

DCAP functions as a control plane for AI execution, enabling consistent, policy-aware behavior across teams, systems, and environments.

How Authority Cascades and Evaluates in DCAP — Context is composed from layers of authority (top-down) and resolved by specificity (bottom-up), similar to CSS. More specific context overrides more general guidance.

Reference Architecture: Context-Assembled Execution

At a high level, the system consists of four components:

Capabilities:
Discrete functions that perform work, such as generate code, review a change, execute a workflow — these are typically skills or plugins in many AI systems today

Context Layers:
Structured layers of guidance defined at different levels of authority: company, org, team, repo, individual
Each of these layers contributes uniquely to the constraints, preferences, and domain specific knowledge necessary to execute a task

Context Assembler
A runtime component responsible for gathering and composing context from all of these layers at the moment of execution.

Execution Engine
This is the model invocation itself, which receives:

The requested capability — for example, a pull request capability
The assembled context (for example, an engineer in the Rocket division at Acme Corp, operating within a repository with defined architecture patterns, ownership rules, and team standards)
The user’s input

The model interprets the combined context and produces an outcome

Example: Pull Request Review

Let’s take an example of a common operation developers perform daily — reviewing a pull request.

In the traditional system, review logic is embedded in personal knowledge, or perhaps automated as a prompt, copied and pasted and tweaked. Maybe you have created a skill, and others have installed that skill and tweaked it to reflect their style or focus.

What happens when we do this?

Review logic becomes embedded in scripts or prompts
Teams and individuals fork and modify behavior
Standards blur and drift over time

Let’s look at a new model:

At runtime, these layers are assembled into a structured context with a preamble describing to the model how to interpret them. The capability has no embedded knowledge of the domain-specific context being injected.

What the capability is aware of is HOW, WHERE and WHY it will consume assembled context. This means that the skill is aware of its role in an enterprise scale system, and it is intentionally abstract, deferring all non-essential decisions to the policy layers.

In our example, a pull request capability knows how to open a pull request, read the diff and then some fundamental instructions, such as “suggest inline changes where appropriate. Explain your rationale for feedback. Give the user a recommendation.” These do not change across an enterprise. These represent the most abstract definition of a pull request review. The key is that the capability knows that it should load the effective policy for this user and apply it to the underlying fundamental tasks — a policy aware capability understands its place in an enterprise.

Key Properties of the Model

This architecture introduces several important properties:

Separation of concerns
Capabilities perform work. Context defines behavior.

Dynamic governance
Rules are evaluated and applied at execution time, not enforced statically.

Composability
New context layers can be added without modifying capabilities.

Convergence over time
Improvements to capabilities benefit all users.
Improvements to context refine behavior locally without fragmentation.

DCAP is a control plane for AI execution. It assembles the right context at the right moment, lets the model interpret it, and governs the outcome — action by action.

Contrast With Existing Systems

Most current systems embed behavior in one of two places:

Inside the capability itself
Inside repeated prompts at invocation time

Both approaches lead to duplication and drift.

In this model, behavior is externalized and composed dynamically, allowing the system to scale without fragmentation.

How the System Scales: Layered Context

This separation has a direct impact on how capabilities evolve over time.

In most mature systems, variation does not live inside the capability itself. It is defined externally — through layered configuration, policy, and override. Infrastructure behaves differently across environments without being rewritten. Identity systems enforce global standards while allowing local exceptions. Interfaces adapt through cascading rules rather than duplication.

AI systems are currently missing this layer entirely.

In many environments, tools and scripts proliferate because variation has nowhere else to live. A capability that performs well in one team is copied and modified for another. Small differences in workflow, naming, validation, or policy become embedded directly into the implementation. Over time, the organization accumulates multiple versions of the same capability, each partially maintained and inconsistently applied.

This is not a new problem. It is a solved one — just not yet applied to AI. Many languages and systems have faced this evolution.

Early systems often embedded structure and behavior directly into rigid, verbose formats. XML became a standard for representing data and configuration, but at scale it proved difficult to work with — too heavy, too repetitive, and hard to evolve.

JSON emerged as a simpler alternative, reducing friction and making data easier to compose and consume. Later, formats like YAML further optimized for readability and layered configuration, allowing systems to express variation more cleanly without duplicating structure.

The pattern is consistent: as systems scale, structure moves out of rigid implementations and into more flexible, composable layers.

AI systems are now encountering the same transition.
What XML was to early systems, embedded behavior and prompt duplication are to AI today.
What JSON and YAML enabled — clear, layered, composable structure — is what context assembly introduces for AI execution.

Evolution of Data and AI Systems — Two evolutions. The same pattern.

When context is externalized, variation moves out of the capability and into the layers that define how it should behave in different situations. A single implementation can serve many contexts, shaped at execution time rather than rewritten at design time. Improvements to the capability benefit every team that uses it, while improvements to context refine behavior locally without creating divergence.

The system begins to converge instead of fragment. Capabilities stabilize. Context evolves. And investment compounds across the organization rather than being diluted by duplication.

Layered Authority and Context Composition

In systems that operate at scale, this kind of control is not achieved through a single source of truth, but through layers of authority.

This pattern is not unique to AI. It is how enterprise systems have long managed complexity — through structured hierarchies of policy, configuration, and override.

Global standards define what must always be true. Organizational and team-level guidance refine how those standards are applied. Local environments introduce specificity. Individuals retain the ability to adapt behavior based on context and experience. Each layer contributes, and each layer has a clear scope of influence.

The same structure applies here.

Without layering, governance collapses toward one of two extremes: it becomes too rigid to adapt, or too loose to enforce.

Context is assembled from a hierarchy of layers, each representing a different level of authority — from company-wide guidance to individual preference.

It is not loaded in advance or applied globally. It is assembled in response to action.

When work is about to be performed — whether generating code, reviewing a change, or executing a workflow — the system gathers the relevant context from each layer and composes it at that moment. The model receives not a static policy, but context that is specific to the action being taken.

When layers disagree, resolution follows the simple pattern already proven by CSS: more specific context overrides more general guidance, with the ability for any layer to define non-overridable constraints where required. This ensures that governance is applied precisely where it matters — at the point of execution — rather than carried broadly across unrelated work.

This is not a static configuration. It is a dynamic composition of guidance that reflects both global standards and local nuance — applied at the moment work is performed.

Redefining Contribution in AI Systems

Same Skill, Different Context, Different Outcome

This model changes how individual contribution is defined.

In traditional workflows, contribution is measured by the artifacts a person produces. As execution becomes increasingly automated, that definition breaks down. The value is no longer limited to the output itself, but to the context that shapes how that output is generated.

In this system, individuals do not just produce results. They define how results are produced.

By encoding their judgment, preferences, and domain knowledge into their own context layer, their contribution persists across every task they perform. It compounds over time, shaping the behavior of otherwise generic capabilities and producing outputs that reflect their expertise rather than replacing it.

At the same time, this model does not require that expertise to be centralized. Individuals retain control over what remains local and what is contributed upstream. This preserves autonomy while still allowing collective improvement.

The result is a system where expertise is not embedded in isolated artifacts, but continuously applied — visible in every outcome, without needing to be manually reintroduced.

Conclusion: Context Defines Behavior

As AI systems take on a larger role in execution, the question is no longer what they are capable of, but how their behavior is shaped.

Systems that treat context as incidental will produce outputs that are technically correct but operationally inconsistent. Systems that embed context directly into tools will fragment as variation accumulates. Neither approach can sustain scale.

What is required is a model in which context is not an afterthought, but the primary mechanism of control.

This is not a new idea. It is how complex systems have always operated — through layered authority, dynamic composition, and context applied at the moment of action. AI systems have simply lacked this layer.

The result is a shift in how we think about execution itself.

Capabilities no longer define behavior. Context does.

Governance is no longer enforced by the system. It is understood by the model.

And scale is no longer achieved by standardizing tools, but by shaping the context in which those tools operate.

Once this layer exists, the problem changes.

The question is no longer how to build better capabilities.

It is how to define the context that makes those capabilities behave correctly — every time, across every environment, for every user.

That is the system. That is how AI behaves consistently at scale.

Human In The Context — Why AI Systems Stall at Scale was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.