LLM Neuroanatomy III – LLMs seem to think in geometry, not language

LLM Neuroanatomy III - LLMs seem to think in geometry, not language

Hi Reddit!

Last month I posted the third part of my series of article on LLM Neuroanatomy just before I left to go on holiday 🏝️. Unfortunately, is was a bit 'sloppy', as I didn't have time to add polish, so I took the article down and deleted the Reddit post.

Over the weekend, I have revised the article, and added in the results for Gemma-4 31B! I'm also wrapping up the Gemma-4-31B-RYS (the analysis will run overnight), and will release Qwen3.6-35B-RYS this week too.

OK, if you have been following the series, you know how in part II, I said LLMs seem to think in a universal language? That was with a tiny experiment, comparing Chinese to English. This time I went deeper.

TL;DR for those who (I know) won't read the blog:

  1. I expanded the experiment from 2 languages to 8 (EN, ZH, AR, RU, JA, KO, HI, FR) across 4 different models (Qwen3.5-27B, MiniMax M2.5, GLM-4.7, GPT-OSS-120B and Gemma-4 31B). All five show the same thing. In the middle layers, a sentence about photosynthesis in Hindi is closer to photosynthesis in Japanese than it is to cooking in Hindi. Language identity basically vanishes!
  2. Then I did the harder test: English descriptions, Python functions (single-letter variables only β€” no cheating), and LaTeX equations for the same concepts. Β½mvΒ², 0.5 * m * v ** 2, and "half the mass times velocity squared" start to converge to the same region in the model's internal space. The universal representation isn't just language-agnostic β€” it's starting to be modality-agnostic (the results are not quite so stong it these small models; I would love to try this on Opus and ChatGPT-5.4)
  3. This replicates across dense transformers and MoE architectures from five different orgs. Not a Qwen thing. Not a training artifact, but what seems to be a convergent solution.
  4. The post connects this to Sapir-Whorf (language shapes thought β†’ nope, not in these models) and Chomsky (universal deep structure β†’ yes, but it's geometry not grammar). If you're into that kind of nerdy thing, you might like the discussion...

Blog with interactive PCA visualisations you can actually play with: https://dnhkng.github.io/posts/sapir-whorf/

Code and data: https://github.com/dnhkng/RYS

On the RYS front β€” still talking with TurboDerp about the ExLlamaV3 pointer-based format for zero-VRAM-overhead layer duplication. No ETA but it's happening.

Again, play with the Widget! its really cool, I promise!

submitted by /u/Reddactor
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top