Hi Reddit!
Last month I posted the third part of my series of article on LLM Neuroanatomy just before I left to go on holiday ποΈ. Unfortunately, is was a bit 'sloppy', as I didn't have time to add polish, so I took the article down and deleted the Reddit post.
Over the weekend, I have revised the article, and added in the results for Gemma-4 31B! I'm also wrapping up the Gemma-4-31B-RYS (the analysis will run overnight), and will release Qwen3.6-35B-RYS this week too.
OK, if you have been following the series, you know how in part II, I said LLMs seem to think in a universal language? That was with a tiny experiment, comparing Chinese to English. This time I went deeper.
TL;DR for those who (I know) won't read the blog:
- I expanded the experiment from 2 languages to 8 (EN, ZH, AR, RU, JA, KO, HI, FR) across 4 different models (Qwen3.5-27B, MiniMax M2.5, GLM-4.7, GPT-OSS-120B and Gemma-4 31B). All five show the same thing. In the middle layers, a sentence about photosynthesis in Hindi is closer to photosynthesis in Japanese than it is to cooking in Hindi. Language identity basically vanishes!
- Then I did the harder test: English descriptions, Python functions (single-letter variables only β no cheating), and LaTeX equations for the same concepts. Β½mvΒ²,
0.5 * m * v ** 2, and "half the mass times velocity squared" start to converge to the same region in the model's internal space. The universal representation isn't just language-agnostic β it's starting to be modality-agnostic (the results are not quite so stong it these small models; I would love to try this on Opus and ChatGPT-5.4) - This replicates across dense transformers and MoE architectures from five different orgs. Not a Qwen thing. Not a training artifact, but what seems to be a convergent solution.
- The post connects this to Sapir-Whorf (language shapes thought β nope, not in these models) and Chomsky (universal deep structure β yes, but it's geometry not grammar). If you're into that kind of nerdy thing, you might like the discussion...
Blog with interactive PCA visualisations you can actually play with: https://dnhkng.github.io/posts/sapir-whorf/
Code and data: https://github.com/dnhkng/RYS
On the RYS front β still talking with TurboDerp about the ExLlamaV3 pointer-based format for zero-VRAM-overhead layer duplication. No ETA but it's happening.
Again, play with the Widget! its really cool, I promise!
submitted by