This is the third post in a sequence on substrates - the layers of computational context that allow AI to be implemented in real systems. The sequence expands on the concept of substrates as described in this paper and was written as part of the AI Safety Camp project "MoSSAIC: Scoping out Substrate Flexible Risks," one of the three projects associated with Groundless.
We claim that AI safety and security research currently has no clean way to reason about these. Post 1 introduced the intuitions for what substrates are and why they matter. Post 2 showed how substrate choices (LayerNorm placement, quantization format, DRAM topology) influence safety-relevant model properties in ways that are not capture by any standard toolkit. This final post introduces the formal framework.
In the previous posts, we looked at choices below the model architecture level (like normalization, weight encoding, and memory layout) and saw that they affect things we care about, like refusal behavior, robustness, and jailbreaks. But we didn’t have a clear way to say where these effects were coming from or how to describe them.
This makes it harder to think clearly and design good evaluations, because the current terms mix up different ideas. In this post, I try to separate these ideas so we can reason about and compare models more clearly. If you can’t name a gap, you can’t design around it. And you can’t compare two deployments of the “same” model without saying what “same” means in different settings. This post is a step toward that.
Four Things a Substrate Is Made Of
We’ll begin with a concrete example and introduce notation only after that. The 4-tuple we arrive at is not arbitrary: each part answers a different question, and the banking example will show why.
Alice wants to send Bob €500. She can do that in several ways. She can use her bank’s website, call the bank and speak to an operator, write a cheque, or use a payment app. The intended action is the same, but each option uses different syntax, different processing systems, and a different interface to the outside world.
If everything works, the abstract result is the same: Alice’s balance goes down by €500 and Bob’s goes up by €500. That is the part we care about for things like fraud detection or account reconciliation. In most cases, whether the transfer happened through a website or over the phone does not matter.
Now consider what happens when something goes wrong. A fraud detector that only watches web transactions will miss the same fraud done over the phone. A system that only tracks transaction IDs will miss cheque-based fraud entirely. What the evaluator sees depends on the interface it uses, not on the underlying behavior.
That is the basic idea the substrate formalism is meant to capture.
The Formal Definition
Now we can abstract. The banking example had four moving parts:
- the set of ways Alice can describe her intended transfer (web form fields, spoken words, ink on a cheque), call this the language
- the process that turns a described transfer into an actual change in account balances, call this the semantics map
⟦ ⟧ - the real-world costs of each channel (time to process, fees, error rates, staffing), call this the resource profile
- the part of the system a monitor or auditor can actually see (the web transaction log, the phone call reference number, the cheque image), call this the observable interface

The banking example mapped onto the four substrate components.
Packaging these four components together gives the definition.
A substrate is a 4-tuple
is the language, the set of syntactic expressions, encodings, or programs the substrate can accept; :⟦ ⟧ is the semantics map, a function assigning to each syntactic object an abstract behavior , where is a fixed space of abstract behaviors; captures resource profile, the computational budget available to the substrate (time, memory, energy, numeric precision, etc.); is the observable interface, which determines which aspects of behavior in are externally visible, via an observation map .
The full computation pipeline that a substrate participates in can now be written out explicitly. Given an input
is encoded: an encoding function produces .- The substrate interprets
: the semantics map returns .⟦ ⟧ - The interface observes the behavior:
. - The observation is decoded into an output:
.
The end-to-end computation is the composite
The banking example maps onto this pipeline directly.

The commutative diagram for substrate computation
What the Definition Clarifies
The first thing this definition separates is syntactic identity from behavioural identity. Two programs can be written in completely different languages, use very different amounts of memory, and still compute the same input-output function. The reverse can also happen: two programs can look almost identical syntactically but behave differently once the resource profile matters. A simple example is the same neural network forward pass run in float32 and in bfloat16. The code may look nearly the same and apply the same operations to the same tensors, but near the edge of the input distribution the outputs can differ because bfloat16 loses precision that float32 keeps. The difference is not in the high-level description alone, but in the substrate. Without separating the syntax from the resource profile, these two kinds of difference get mixed together.
The second thing the definition clarifies is that the same abstract behavior can be realized by multiple substrates.
The third clarification, and the most important for safety, is about observability. Consider two scenarios:
Scenario A: A model’s abstract behavior
Scenario B: A researcher changes the interface to include the inputs that trigger the failure. The same abstract behavior
In both cases, the underlying behavior in
Modular Addition as the Main Worked Example
The main example comes from mechanistic interpretability. It is useful because the task itself is completely well understood modular arithmetic. The abstract behavior is the same across the models, but the way the models implement it is very different.
The Setup
Zhong, Liu, Tegmark, and Andreas (NeurIPS 2023) trained small transformer models to do modular addition: given two inputs
The Clock Algorithm
One group of models, called Model B, uses a standard one-layer transformer with attention. These models implement what the paper calls the Clock algorithm. The idea is simple: each integer from 0 to 58 is represented as a point on a circle, so that a number
Concretely, each token is embedded in a way that encodes its angle on the circle. The attention mechanism produces products that combine information from the two input positions, and those products encode the sum through the angle-addition formula. The logit for a candidate output is then maximized at the correct residue.
The Clock algorithm needs multiplication between the inputs. It uses a special feature of attention: attention weights multiply the value vectors they route, so the model can combine terms across the two input positions. That kind of cross-token interaction is exactly what the trigonometric identity needs. So the Clock algorithm is the natural solution when attention is the main source of nonlinearity.
The Pizza Algorithm
Another group of models, Model A, uses constant or uniform attention. These models implement something different, which the paper calls the Pizza algorithm. Instead of working on the circle itself, this algorithm works inside it.
The key geometric fact is this: for a fixed target residue, all input pairs that produce that residue have midpoints that lie on one specific ray from the origin in the 2D embedding plane. These rays divide the disk into 59 “pizza slices,” which is where the name comes from. To decide the output, the network checks which slice the midpoint falls into.
The logit formula is different from the Clock case by a multiplicative factor. That factor makes the Pizza algorithm depend on the distance between the two inputs on the circle, while the Clock algorithm does not.
The Pizza algorithm only needs absolute-value nonlinearity, not multiplication. Once the midpoint is computed as a linear operation, the problem becomes checking which side of certain lines it lies on, and absolute value together with linear layers can do that cleanly. So ReLU layers can implement the whole pipeline without the cross-token multiplication that the Clock algorithm needs.

Illustration of the Clock and the Pizza Algorithm (from Zhong et al., 2023)
What This Means in Substrate Terms
Now let us map this onto the 4-tuple. Define two substrates:
: the set of token-pair inputs . : the Clock forward pass circular embeddings followed by attention-mediated angle addition (i.e., the forward pass of a transformer with⟦ ⟧ ) : architectural affordances of a full-attention transformer - cross-token multiplicative interactions are available : input-output interface; reads off the argmax logit
: the same token-pair inputs : the Pizza forward pass circular embeddings followed by midpoint-and-slice-detection (i.e., the forward pass of a transformer with⟦ ⟧ ) : the resource profile of a linear-layer-dominant model (no multiplicative attention overhead) : the same input-output interface

Both substrates achieve 100% accuracy. Sharing the same
But
When does this matter? If we only look at the interface
The paper shows this by using a different interface
Distance and Morphisms
Having defined what a substrate is, we can now define two relational concepts: how far apart two substrates are, and what it means for one substrate to translate into another.
Distance Between Substrates
Given a substrate
This is the set of abstract behaviors
We can then define a similarity measure between two substrates by comparing their realized behavior sets. Writing
When

Interaction between the realized behavior sets of two substrates
A concrete example: let
The distance quantifies the capability gap, how much of
In the Clock/Pizza example, if we take
One important subtlety is that this similarity is not a metric in the strict mathematical sense, because it does not always satisfy the triangle inequality. Two substrates can each be equally similar to a third substrate without being similar to each other. That is not a bug; it captures the idea that two models can each be equally close to a reference model in capability, while still being very different from one another. We present this as a proposed definition, and whether a true metric can be built from it is still open.
Conclusion
The formalism introduced here is meant to do one thing: give us precise vocabulary for the layer of computational implementation. A substrate
Appendix - A Worked Sorting Example
Let
Substrate
: valid Python programs : CPython interpreter (maps code to its function)⟦ ⟧ : CPU execution, float64, limited parallelism (GIL), no vectorization : standard I/O; reads output from stdout
Substrate
: valid C++17 programs : compiled binary (maps code to its function)⟦ ⟧ : native execution, SIMD available, no interpreter overhead : same I/O; same observations
A Quicksort implementation in Python and in C++ yield
Now replace Quicksort with Mergesort on the same
Quicksort sorts in place. It partitions around a pivot, recurses on subarrays, and accesses memory in a data-dependent pattern. This works well on small inputs that fit in cache, but on large inputs it causes more cache misses because access is irregular. Its recursion depth is
Mergesort, by contrast, accesses memory sequentially during merging. That is exactly what hardware prefetchers are good at, so it often hides memory latency better on large inputs. The tradeoff is extra space: it needs
So although both algorithms have the same semantics in
That is exactly what the 4-tuple is meant to capture. A framework that only tracks
For sorting, the realized behavior sets satisfy
Discuss