/u/ReasonableRefuse4996

Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

/u/ReasonableRefuse4996 / April 12, 2026

I'm a master's student in Germany and I got obsessed with one question: can you run a model that's "too big" for your hardware? After weeks of experimenting I combined three techniques — lazy MoE expert loading, TurboQuant KV co…

Author name: /u/ReasonableRefuse4996

Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant