LLM Neuroanatomy III – LLMs seem to think in geometry, not language

Hi Reddit!

Last month I posted the third part of my series of article on LLM Neuroanatomy just before I left to go on holiday 🏝️. Unfortunately, is was a bit 'sloppy', as I didn't have time to add polish, so I took the article down and deleted the Reddit post.

Over the weekend, I have revised the article, and added in the results for Gemma-4 31B! I'm also wrapping up the Gemma-4-31B-RYS (the analysis will run overnight), and will release Qwen3.6-35B-RYS this week too.

OK, if you have been following the series, you know how in part II, I said LLMs seem to think in a universal language? That was with a tiny experiment, comparing Chinese to English. This time I went deeper.

TL;DR for those who (I know) won't read the blog:

I expanded the experiment from 2 languages to 8 (EN, ZH, AR, RU, JA, KO, HI, FR) across 4 different models (Qwen3.5-27B, MiniMax M2.5, GLM-4.7, GPT-OSS-120B and Gemma-4 31B). All five show the same thing. In the middle layers, a sentence about photosynthesis in Hindi is closer to photosynthesis in Japanese than it is to cooking in Hindi. Language identity basically vanishes!
Then I did the harder test: English descriptions, Python functions (single-letter variables only — no cheating), and LaTeX equations for the same concepts. ½mv², 0.5 * m * v ** 2, and "half the mass times velocity squared" start to converge to the same region in the model's internal space. The universal representation isn't just language-agnostic — it's starting to be modality-agnostic (the results are not quite so stong it these small models; I would love to try this on Opus and ChatGPT-5.4)
This replicates across dense transformers and MoE architectures from five different orgs. Not a Qwen thing. Not a training artifact, but what seems to be a convergent solution.
The post connects this to Sapir-Whorf (language shapes thought → nope, not in these models) and Chomsky (universal deep structure → yes, but it's geometry not grammar). If you're into that kind of nerdy thing, you might like the discussion...

Blog with interactive PCA visualisations you can actually play with: https://dnhkng.github.io/posts/sapir-whorf/

Code and data: https://github.com/dnhkng/RYS

On the RYS front — still talking with TurboDerp about the ExLlamaV3 pointer-based format for zero-VRAM-overhead layer duplication. No ETA but it's happening.

Again, play with the Widget! its really cool, I promise!

submitted by /u/Reddactor
[link] [comments]

Leave a Comment