Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

I'm a master's student in Germany and I got obsessed with one question:

can you run a model that's "too big" for your hardware?

After weeks of experimenting I combined three techniques — lazy MoE

expert loading, TurboQuant KV compression, and SSD streaming — into

a working system.

Here's what it looks like running on my Intel UHD 620 laptop with

8GB RAM and zero GPU...

GitHub: https://github.com/patilyashvardhan2002-byte/lazy-moe

Would love feedback from this community!

submitted by /u/ReasonableRefuse4996
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top