Semantic Phonons: Lattice Vibrations in AI Internals

Phonons refer to collective vibrational modes in crystals, propagating through the lattice as a wave.

Phonons refer to collective vibrational modes in crystals, propagating through the lattice as a wave.

Introduction

One of the most pressing questions in AI research is how language models represent meaning. How does a model know that "excellent" is better than "good"? How does a model represent time scales, e.g., hour vs. day? Where does the knowledge live that July comes after June? Language models are trained to predict text, and they are very good at doing so, but our understanding of their internal representations remains limited. Figuring this out is the holy grail of mechanistic interpretability: the quest of opening the black box and understanding what is going on inside.

There is now growing evidence that semantics are not stored as arbitrary, unstructured patterns. Instead, we find that models encode meaning in predictable geometric structures. A recent example comes from Karkada et al. (2026), who showed that the twelve months of the year form a near-perfect circle in the activation space of a language model (Gemma 2B). The months are not just close to each other in some vague sense, but are arranged in an ordered loop, reflecting the cyclical structure of the year. The question is whether this result is a coincidence, or reflecting something deeper.

Without noting it explicitly, what the authors did in their paper is actually to apply mathematical frameworks established decades ago in the field of solid-state physics, in particular for the study of phonon modes. Phonons describe how vibrations propagate through crystals, and the mathematics behind them provides a rich framework for understanding collective behavior on discrete lattices. The case of the months of the year is an example of one of the simplest structures one can deal with: periodic boundary conditions, where the first and last atom of the chain are coupled to each other, as if bending the chain into a ring. Crucially, physics offers an entire arsenal of further-reaching tools, covering different boundary conditions and higher-dimensional lattice geometries. If the geometry of something like the months of the year can be described through this lens, there is a chance to derive significantly more semantic structure from the same principles. In this post, I lay out the core ideas and share early experimental results for four different boundary condition types.

Background

What are phonons?

A crystal is not a static object. Every atom sits in a potential well created by its neighbors, and at any nonzero temperature, these atoms vibrate. The vibrations do not happen independently: since the atoms are coupled, a disturbance at one site propagates through the lattice as a wave. These collective vibrational modes are called phonons. The insight is that even though the underlying system is a discrete lattice of atoms, the collective behavior can be described by smooth wave functions. A phonon with wavevector has a spatial profile , and the frequency depends on through a dispersion relation. What makes phonons particularly useful as a conceptual tool is the role of boundary conditions. The shape of the vibrational modes depends not just on the coupling between atoms, but also on what happens at the edges of the system. Different boundary conditions produce qualitatively different mode shapes, and each corresponds to a distinct geometric prediction.

Mapping to AI internals

In the phonon picture, the "atoms" of the lattice are words or tokens, and their positions in the embedding space play the role of atomic displacements. A semantic concept like "levels of quality" defines a one-dimensional chain of tokens ordered by meaning: worst, bad, mediocre, okay, good, excellent, best. The embedding vectors of these tokens are the "displacements" of the atoms.

The mathematical justification for this starts with an "old" result in natural language processing. Word embedding models like word2vec and GloVe learn a word embedding matrix whose -th row is the -dimensional representation of word . The objective, explicit or implicit, is to predict co-occurrence: how often word appears within a fixed context window of word . As shown by Levy and Goldberg (2014), a skip-gram model with negative sampling (word2vec architecture) implicitly factorizes the pointwise mutual information (PMI) matrix shifted by a constant:

where is the empirical co-occurrence probability and are unigram probabilities. GloVe explicitly regresses on , which is proportional to PMI up to marginal terms. Both models are therefore spectral methods on the PMI matrix.

For words that lie on a semantic continuum, the PMI matrix has additional structure. In the absence of other strongly discriminating features, no particular position on the continuum is special, so the co-occurrence probability between two words tends to depend only on their positions along that continuum: for some kernel . This structure, together with the boundary conditions, determines the shape of the eigenmodes.

To see what this implies geometrically, consider the eigendecomposition , where is the orthogonal matrix of eigenvectors () and is diagonal with eigenvalues The trained embeddings then take the form

This means that the -th principal component of the word embedding directly encodes the -th eigenmode of . Understanding the geometry of word representations therefore reduces to understanding the spectral structure of the PMI matrix.

Why is that?

The Gram matrix of the embeddings satisfies . We need to find such that this holds. If we substitute the ansatz , then:

The last equality is exact only if is positive semidefinite. In general, some eigenvalues are negative (words that repel each other have negative PMI), so we take the absolute value, which approximates by treating all eigenvalue contributions positive.

Geometrically, this means the following: Each word gets a -dimensional embedding vector. The -th coordinate of that vector is:

The eigenvectors encode the shape of each mode, defining which words move together. The eigenvalues encode how strongly that pattern is expressed in the corpus. High-eigenvalue modes dominate the embedding (they explain the most variance). Since the embedding matrix is just a rescaled version of , the principal components of the embedding are just the eigenvectors of .

A natural follow-up question is whether this connection carries over to modern language models, which are far more complex than word2vec or GloVe. Recent work suggests that it does, at least as a first approximation. Cagnetta and Wyart (2024) showed that transformers trained on hierarchical data first learn to exploit short-range token correlations before progressively resolving longer-range ones, effectively building deeper representations of the data structure as the training set grows. Complementarily, Rende et al. (2024) demonstrated that transformers learn many-body token interactions in order of increasing degree, with pairwise co-occurrence statistics of the kind captured by PMI being acquired first. This learning hierarchy suggests that the PMI geometry forms a kind of scaffold on which richer representations are later built. Empirical validation for this picture was recently provided by Karkada et al., who showed that PMI-derived geometric predictions persist in the internal activations of Gemma 2B. The spectral structure of co-occurrence is not just an artifact of shallow embedding models; it appears to be a feature that survives into the deeper layers of modern architectures.

The phonon framework gives a physical interpretation to this spectral structure. A phonon mode corresponds to a direction in embedding space along which tokens are coherently displaced, and that direction is precisely one of the eigenmodes of the co-occurrence matrix. If the tokens are arranged according to a smooth vibrational mode , we expect their principal components (PCs) to trace out the shape of that mode: the first PC gives , the second gives , and so on. The question becomes: which mode shapes do we actually observe, and which boundary conditions do they correspond to?

Boundary conditions

Different boundary conditions. Karkada et al. have used periodic boundary conditions for the circular structure of months of the year. Different semantic concepts can be represented by alternative boundary conditions.

Different boundary conditions. Karkada et al. have used periodic boundary conditions for the circular structure of months of the year. Different semantic concepts can be represented by alternative boundary conditions.

Different boundary conditions impose different constraints on how vibrations can look at the endpoints of a chain, and each produces a characteristic geometric signature. The starting point is always the same: we look for modes satisfying

which has the general solution or equivalently . The boundary conditions select which values of are allowed and fix the ratio , determining the mode shapes. There is a vast number of boundary conditions that may be interesting moving forward, but I will here lay out just a few of the most important ones.


Periodic boundary conditions

The simplest case is periodic boundary conditions, which connect the two endpoints of the chain as if bending it into a loop. The conditions and require the function and its derivative to match at both ends. Starting from the complex form of the general solution, both conditions are satisfied when , which forces for integer . The resulting modes are complex exponentials

whose real and imaginary parts give and . For the first mode (), the two components satisfy . This gives the geometry shown by Karkada et al., where the month tokens trace out a circle in the plane. Other semantic concepts that would be natural candidates for this boundary condition include days of the week, hours of the day, and similarly cyclic concepts.

Dirichlet boundary conditions

Dirichlet boundary conditions pin the displacement to zero at both endpoints: and . The condition at gives , leaving . The condition at requires , so , for which the modes are

The endpoint tokens have zero projection onto every mode. Semantically, this may be interpreted such that the end tokens do not carry any semantic variation, and all representational "richness" lives in the interior of the chain. This might apply to concepts where the extremes are absolute states, something like a scale from "dead" to "alive", where the endpoints are definitionally fixed and the semantic nuance (dying, recovering, thriving…) lives in between.

Neumann boundary conditions

Neumann boundary conditions require the derivative to vanish at both endpoints: and . With the derivative of the general solution as , the condition at implies (assuming ). The remaining function must satisfy , which again requires . The modes are

Note that is now a valid mode that gives , corresponding to a uniform displacement. Since PCA on centered data removes the mean, this constant mode is projected out, and the first PC corresponds to . With , the first nontrivial mode is and the second is . Using , these satisfy the Chebyshev relation

which is a parabola in the plane. Neumann conditions correspond to open-ended ordinal scales, where the endpoints are not fixed in value, but the rate of semantic change flattens out. For example, "excellent" is not fundamentally different from "good", but rather just further along the scale. Concepts that might be encoded using this boundary condition include levels of quality (terrible to excellent), levels of certainty (impossible to certain), or emotional valence (miserable to wonderful).

Robin boundary conditions

Robin boundary conditions interpolate between Dirichlet and Neumann. The condition at each endpoint allows the mode to neither vanish nor have zero slope, but to satisfy a weighted combination of the two. Applying with the general solution gives , fixing the ratio . The mode shape becomes

Applying the condition at then yields a transcendental equation for the allowed wavenumbers that must be solved numerically. For , the Neumann cosine modes are recovered; for , the Dirichlet sine modes emerge. In PC space, the geometry interpolates smoothly between the Dirichlet and Neumann cases. This might arise for concepts where the endpoints have partial, but not total, semantic anchoring, meaning they exert some pull, but are not absolutely rigid.

Note that the Robin BC is not just a theoretical interpolation. Karkada et al. show that for an exponential co-occurrence kernel with finite length scale , the quantization condition for the allowed wavenumbers is precisely the Robin condition with . Pure Neumann BC is only recovered in the limit , where the kernel becomes flat. Robin BC is therefore the generic case for open-ended semantic scales (e.g., years of history), with Neumann as the idealized limit of very long-range co-occurrence.

2D Neumann boundary conditions

Many semantic concepts are not one-dimensional. When tokens vary along two independent semantic axes, the appropriate framework is a two-dimensional domain with Neumann conditions on all four edges: at and at . The wave equation can be separated as , where each factor satisfies the one-dimensional Neumann problem. The modes are therefore products of cosines:

The lowest nontrivial modes are (variation along only), (variation along only), and (variation along both). A classic example of a two-dimensional semantic structure is the Russell circumplex model of affect, which organizes emotions along two axes: valence (unpleasant to pleasant) and arousal (calm to excited). If the phonon framework applies, the leading modes should separate these two dimensions, with each principal component capturing the cosine profile along one axis. The mixed mode would then encode the interaction between valence and arousal.

First experiments

I tested this framework with a few boundary conditions at specific semantic concepts, and compared the theoretical prediction to the representations I found in GloVe and in Gemma 2B.

Experimental details

Embedding models: We extract representations from two embedding sources. For GloVe, we use the 300-dimensional GloVe-Wiki-Gigaword embeddings (top 400k vocabulary) loaded via gensim; each concept word is looked up directly and its 300-dimensional vector extracted. Note that the theory holds exactly only in the full-rank regime, where the embedding dimension d is at least as large as the rank of . For a vocabulary of 400k words, this condition is far from satisfied at , which means the predicted manifolds are only recovered approximately and noisier results are expected. For the LLM, we use Gemma 2B (18 transformer layers, hidden dimension 2048). Each concept word is placed into a short, scale-specific prompt template that provides semantic context. For example, The temperature feels {word} for the temperature scale, The month of the year is {month}, or The amount of storage is one {word} for data-storage units. Activations are extracted using last-token pooling: for each prompt, the hidden-state vector at the final non-padding token position is taken as the representation (the last token aggregates information from the full prompt context).

PCA and geometric fitting: For each concept scale, the embedding vectors are assembled into an matrix, mean-centered, and decomposed via truncated SVD. The PCA scores (the columns of ) give the projections onto the leading principal directions. We retain up to components ( for Neumann, for periodic and log-scale).

The geometric fit appropriate to each boundary condition type is then applied:

  • Periodic BC: For each pair of PCs and each harmonic mode , a cos/sin ellipse is fit by ordinary least-squares. The best PC pair and harmonic are selected by highest .
  • Neumann BC: For each pair of PCs, a rotation-agnostic parabola fit is performed: a coarse angular grid search (180 angles over ) followed by golden-section refinement finds the rotation that maximizes the of a degree-2 polynomial. The best PC pair is selected by highest over all combinations.
  • Log scale: For each mode , the target (where is the normalized log-position) is regressed onto the full set of retained PCs plus an intercept via least-squares.

All three fitting procedures incorporate iterative sigma-clipping (threshold , minimum 60% inliers) to identify outliers.

Layer selection for LLM: For Gemma 2B, we perform a layer sweep across layers 1–18 (excluding the raw embedding layer 0). Shown is always the layer of highest .

Periodic concepts

I started by validating the result of Karkada et al. and extending it to further cyclic concepts. Besides the months of the year, I tested days of the week and compass directions (north, northeast, east, ...).

Periodic boundary conditions connect the endpoints of the chain into a loop. The resulting modes are complex exponentials, and the first two principal components trace out a circle.

Periodic boundary conditions connect the endpoints of the chain into a loop. The resulting modes are complex exponentials, and the first two principal components trace out a circle.

For all three concepts, the tokens arrange themselves in approximately circular structures, though with varying degrees of accuracy. In GloVe, the circular geometry is noisy: the months form a loose arc rather than a clean circle, and the sequential ordering is largely lost. The days of the week are not really circular in structure either, but the compass directions perform quite well. This is interesting, since the word2vec results reported by Karkada et al. for months were considerably cleaner, suggesting that the circular structure is more clearly expressed in some embedding models than others. This is related to the full-rank regime argument: at , GloVe operates far below the rank of and the predicted manifolds are only approximately recovered. In Gemma 2B, the picture improves substantially. All three concepts trace recognizable circles with largely correct ordering, particularly the days of the week and compass directions, which show near-perfect circular arrangements.

Experimental validation of periodic boundary conditions for months of the year, days of the week, and compass directions.

Experimental validation of periodic boundary conditions for months of the year, days of the week, and compass directions. In GloVe, the circular structure is noisy and the ordering often scrambled, particularly for months (, 25.2% variance) and days of the week (, 84.3%). Compass directions (, 24.6%) show better agreement. In Gemma 2B, all three concepts trace cleaner circles with largely correct ordering: months at layer 12 (, 55.6%), days of the week at layer 5 (, 53.5%), and compass directions at layer 7 (, 32.3%).

Ordinal scales

The next case tests Neumann boundary conditions on three ordinal concepts: levels of quality (terrible to outstanding), temperature (cold to boiling), and emotional valence (devastated to ecstatic), which are examples of open-ended ordinal scales. The tokens have a clear linear ordering, but neither endpoint is pinned to an absolute, immovable value. The rate of semantic change flattens out at the extremes: the difference between "terrible" and "bad" feels smaller than the difference between "terrible" and "decent". I treated this as a signature of Neumann boundary conditions, where the derivative of the mode vanishes at the endpoints. The meaning does not stop abruptly, but it levels off.

The theoretical setup is rather straightforward. Given tokens placed at evenly spaced positions along a chain of length , the Neumann eigenmodes evaluated at those positions give the predicted scores for each token. As derived above, the first modes satisfy the Chebyshev relation (regardless of ). Hence, theory predicts the tokens lie on a parabola.

The Neumann boundary condition applies to open-ended ordinal scales: concepts with a natural ordering where neither endpoint is semantically pinned to a fixed value. Theory predicts the embeddings to lie on a parabola.

The Neumann boundary condition applies to open-ended ordinal scales: concepts with a natural ordering where neither endpoint is semantically pinned to a fixed value. Theory predicts the embeddings to lie on a parabola.

For all three concepts, both in GloVe and Gemma 2B, the embeddings do fall on the predicted parabolic shape. But the parabola does not always appear in the leading two principal components; in several cases it is found in higher-order PCs, such as for temperature in GloVe or for quality in Gemma 2B. But this is consistent with the framework: the parabolic Chebyshev relation holds between the first two nontrivial Neumann modes regardless of which principal components they correspond to, and the Neumann modes need not be ordered by decreasing variance. Their ranking in PC space depends on the eigenvalue spectrum of the co-occurrence kernel, which is shaped by the training data.

Testing the Neumann prediction for quality, temperature, and emotion valence.

Testing the Neumann prediction for quality, temperature, and emotion valence. The embeddings lie on the theoretically predicted parabola, despite some perturbations in ordering. In GloVe, the parabola appears in different PC pairs: quality (, 23.9% variance), temperature (, 21.7%), and emotion valence (, 46.1%). In Gemma 2B, the agreement is generally stronger: quality at layer 18 (, 29.1%), temperature at layer 14 (, 38.7%), and emotion valence at layer 1 (, 36.9%). Heatmaps show pairwise cosine similarity in the top-6 principal component subspace.

Interestingly, the ordering along the parabola does not always match the ranking we would intuitively assign. The parabolic geometry is a consequence of the boundary condition alone and holds for any set of tokens governed by Neumann conditions, regardless of their spacing. The ordering, by contrast, depends on the positions along the chain, which are set by the model's co-occurrence statistics rather than by any geometric constraint. When the ordering deviates from our expectation, the most natural explanation is either that near-synonyms are ranked differently by the model's internal statistics than by our intuition, or that the tokens participate in multiple overlapping semantic dimensions whose interference shifts their effective positions along the curve.

Logarithmic scales

I also tested concepts that vary over many orders of magnitude, such as storage capacity (byte to exabyte), temporal duration (second to millennium), and monetary value (cent to trillion). In contrast to the ordinal scales above, the assumption of uniform spacing breaks down here. The semantic distance between a kilobyte and a megabyte is not the same as between a megabyte and a megabyte-plus-one-kilobyte. What matters is the ratio between adjacent levels, not their difference, for which the meaning-carrying structure is logarithmic.

This changes the boundary conditions. Logarithmic scales are asymmetric by nature. The lower end is anchored at a definite smallest value: the scale of storage starts at a byte, monetary value at a cent, and so on. These are not fundamental limits, but they mark the point where each concept begins. The upper end, by contrast, is open-ended; there is no natural maximum to storage or money, so the scale is free there. This asymmetry calls for mixed boundary conditions: Dirichlet at the lower end with and Neumann at the upper end with . The resulting modes are

where is the normalized log-position

which maps the range of physical values onto the interval . The first mode completes exactly one quarter-cycle, the second a half-cycle, and so on. These are simple sinusoids in the logarithmic coordinate (though they appear as chirps when plotted against the physical variable ).

For logarithmic scales, mixed boundary conditions apply: Dirichlet (pinned) at the lower end and Neumann (free) at the upper end. Atoms spaced uniformly in log-space (right) produce qualitatively different mode shapes than linear spacing (left).

For logarithmic scales, mixed boundary conditions apply: Dirichlet (pinned) at the lower end and Neumann (free) at the upper end. Atoms spaced uniformly in log-space (right) produce qualitatively different mode shapes than linear spacing (left). Theory predicts sinusoidal modes when plotted against the normalized log-position .

The analysis also differs slightly from the ordinal case in how the modes are extracted from the data. For ordinal scales, theory predicts the parabola to live in a specific two-dimensional plane in PC space, so the first two principal components directly reveal the geometry. For logarithmic scales, there is no reason the relevant mode should align with the first principal component. To test the prediction, we search over the top principal components to find the one whose values most closely follow a quarter-sine profile , where amplitude, phase, and offset are fit freely. The plots show this best-matching principal component plotted against the normalized log-position , with the theoretical curve overlaid.

Testing the mixed Dirichlet-Neumann prediction for storage, time, and monetary scales.

Testing the mixed Dirichlet-Neumann prediction for storage, time, and monetary scales. Shown is the best-fitting principal component projected against the normalized log-position . In GloVe, storage (, 23.6% variance), time (, 29.9%), and money (, 34.9%) all show good agreement with the predicted mode shapes. In Gemma 2B, time at layer 9 (, 34.5%) and money at layer 1 (, 37.6%) show clean fits, while storage at layer 4 () is captured by a low-variance component. Heatmaps show pairwise cosine similarity for time in the top-6 principal component subspace.

Testing these three concept sets in both GloVe and Gemma 2B, we find confirmation of the predicted mode shapes. In GloVe, storage and money show good agreement with the first mode, with the tokens tracing the quarter-sine arc when plotted against their log-positions. Time conforms better to the second mode, displaying a half-sine shape. Gemma 2B also shows good agreement with the theoretically predicted shapes, particularly for time and money. Storage in Gemma is the weakest case: the best-matching principal component captures essentially no variance, meaning the sinusoidal pattern, while present, lives in a direction that is geometrically irrelevant to the overall embedding structure. This may indicate that the logarithmic storage hierarchy is not a salient feature of Gemma's representations at the tested layer, or that the signal is distributed too thinly across many components to be recoverable from a single PC.

Also, the fact that different concepts and models favor different modes (e.g., mode 1 for money vs. mode 2 for time) is not entirely unexpected: the theory predicts a family of modes, and which one best matches a given concept depends on the eigenvalue spectrum of the co-occurrence kernel. The eigenvalues determine how much variance each mode captures, and their relative magnitudes are shaped by the training data, the tokenization, and the model architecture.

2D concepts - Russell circumplex

The previous experiments all dealt with one-dimensional semantic scales. But many concepts are inherently two-dimensional. A classic example is the Russell circumplex model of affect, which organizes emotions along two independent axes: valence (unpleasant to pleasant) and arousal (calm to excited). If the phonon framework extends to two dimensions, the 2D Neumann modes should be recoverable from the embeddings of emotion words, with valence and arousal playing the role of the two spatial coordinates.

Interestingly, a very recent study by Anthropic's interpretability team provides independent evidence that this structure is real. They extracted 171 emotion vectors from Claude Sonnet 4.5 and found that the top two principal components of the emotion vector space align with valence () and arousal (), consistent with Russell's model. In the phonon framework, this is what the two fundamental modes and predict: the leading principal components should each capture the cosine profile along one semantic axis while being flat along the other.

The 2D Neumann boundary condition applies to concepts varying along two independent semantic axes. Mode (1,0) captures variation along one axis only, while mode (1,1) encodes the interaction between both dimensions.

The 2D Neumann boundary condition applies to concepts varying along two independent semantic axes. Mode (1,0) captures variation along one axis only, while mode (1,1) encodes the interaction between both dimensions. This is tested using data from the NRC VAD Lexicon (Mohammad, 2018), which provides crowdsourced human ratings of valence and arousal for each emotion word.

To test this, I selected 69 emotion words spanning all four quadrants of the circumplex, from high-arousal negative (terrified, enraged) to low-arousal positive (serene, tranquil). Each word is assigned a coordinate , where denotes valence and denotes arousal. Embeddings were extracted from Gemma 2 27B, and for each theoretical mode I found the linear combination of the top 10 principal components that best reconstructs the predicted mode shape across all 69 words. A linear combination is used rather than a single PC because the PCA axes of the embedding subspace need not align with the phonon modes. Modes with similar eigenvalues can mix into each other, potentially spreading the semantic signal across several principal components.

Experimental details

Word set and coordinates: I selected 69 emotion words spanning all four quadrants of Russell's circumplex model of affect, from high-arousal negative (terrified, enraged) to low-arousal positive (serene, tranquil). Each word is assigned valence and arousal coordinates from the NRC VAD Lexicon (Mohammad, 2018), a database of human ratings for over 20,000 English words. The ratings were collected via crowdsourcing using Best-Worst Scaling, where annotators were shown sets of four words and asked to identify the one with the highest and lowest intensity along a given dimension.

Embedding extraction: I used Gemma 2 27B in bfloat16 and extracted hidden states using last-token pooling as before. The prompt template was The person is feeling {word} . A layer sweep over layers 1 through (excluding the raw embedding layer) selects the layer that maximizes the sum of for the two fundamental modes and .

PCA and mode fitting: The embedding matrix is mean-centered and decomposed via truncated SVD, retaining the top 10 principal components. For each candidate 2D Neumann eigenmode , we compute a target vector for each word, and then find the linear combination of all 10 PCs (plus an intercept) that best reconstructs it via ordinary least-squares:

So we ask which mixture of PCs best matches a given mode. The fitted value is the data quantity shown in the figures, where its sign and magnitude indicate how well each word's embedding aligns with the predicted mode shape. A sweep over retained PCs from 2 to 30 shows that the valence mode is largely captured already at , while the arousal mode is more diffuse and approaches its asymptote near .

Testing the 2D Neumann prediction on 69 emotion words from Russell's circumplex model, using Gemma 2 27B at layer 28.

Testing the 2D Neumann prediction on 69 emotion words from Russell's circumplex model, using Gemma 2 27B at layer 28. Left: mode captures valence (), with a clean gradient from negative (red) to positive (blue) emotions largely independent of arousal. Right: mode captures the valence-arousal interaction (), with the sign of the projection depending on the quadrant of the plane. Color indicates the PC projection onto the best-fitting linear combination of the top 10 principal components for each mode.

The results are shown for two modes. The left panel shows mode , which varies along valence only: . The color gradient runs cleanly from left to right, with negative-valence emotions (terrified, angry, depressed) showing strong positive projections and positive-valence emotions (ecstatic, joyful, serene) showing strong negative projections, largely independent of the vertical arousal axis. This is exactly what the theory predicts: mode should capture variation along one semantic dimension while being flat along the other. The right panel shows mode , the interaction mode . Here the color pattern reflects both axes simultaneously: the sign of the projection depends on which quadrant of the valence-arousal plane a word falls in. High-arousal negative emotions and low-arousal positive emotions project in one direction, while the opposite corners project the other way. This is the signature of the product mode, which changes sign across the diagonal of the plane.

Outlook

These are early results, but very encouraging ones. Karkada et al. (2026) showed that the twelve months of the year are embedded as a near-perfect circle in the activations of Gemma 2B, a geometry that follows from periodic boundary conditions applied to a cyclic concept. The mathematics they used is just an excerpt from the broader frameworks developed in solid-state physics to study crystal vibrations. I tested whether this connection extends beyond a single case by applying different boundary conditions to different semantic concepts, and found confirmation for periodic, Neumann, mixed, and 2D Neumann geometries. Much more work is needed to establish how robust and general these patterns are, but the basic premise, that the geometry of word representations can be derived from physical principles applied to the semantic structure of a concept, appears to hold.

I find it noteworthy that just around the time of this writing, a paper appeared on the arXiv (accepted at ICLR 2026) titled "The Lattice Representation Hypothesis of Large Language Models". The author shows that LLM embeddings encode not just individual concepts but the algebraic structure of a concept lattice, with operations like meet and join recoverable directly from the geometry. If this lattice structure is real, the phonon modes I proposed could just be the vibrational modes of exactly that lattice, with the boundary conditions set by the local topology of the concept graph. The lattice would provide the structure, and the phonon framework the dynamics. This is speculative, of course.

I am currently exploring how the phonon picture can be extended to additional boundary condition types, to higher-dimensional semantic structures, and how different modes might interfere with one another and what such interference would actually mean. The goal is to find a theoretically grounded description of how models represent meaning internally. The recent finding by Anthropic's interpretability team that emotion representations in Claude causally drive alignment-relevant behavior, and that these representations organize along the same valence-arousal geometry that the 2D phonon modes predict, suggests that this line of research may have practical consequences beyond interpretability. My hope is that a principled understanding of how concepts like morality or harm are geometrically encoded could offer a path toward understanding and enforcing alignment.

the-simpsons-bart.gif


Acknowledgments

Thank you to Dhruva Karkada, Lysander Mawby, and Marvin Koss for feedback :)



Discuss

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top