Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

I'm a master's student in Germany and I got obsessed with one question:

can you run a model that's "too big" for your hardware?

After weeks of experimenting I combined three techniques — lazy MoE

expert loading, TurboQuant KV compression, and SSD streaming — into

a working system.

Here's what it looks like running on my Intel UHD 620 laptop with

8GB RAM and zero GPU...

Would love feedback from this community!